Computing topographic prominence from contour line vector data

28 Jul 2021 | all notes

a technical study for the view

The most accurate way to calculate topographic prominence is based on using the highest-resolution DEM raster data available.

A rougher but computationally much cheaper approach is to base the calculation directly on contour line vector data sets. The approach is based on one of the equivalent definitions of topographic prominence, which is

the difference between the height of a peak and the lowest contour line that encloses no higher peak.

Based on any contour line vector dataset one can therefore get a good apprimxation of prominence (down to the resolution of the contour line spacing) in two steps:

  1. build a tree/forest of the altitude-labelled contour polygons, where a node’s parent is the contour line one contour level lower which directly encloses the node (or, conversely, a node’s children are all those contour lines one contour level higher which lie directly inside the contour line). The leaves of this tree are all contour lines which enclose no higher contour lines within them – areas that contain a peak.
    • based on a set of contour lines/polygons, the single enclosing contours for each polygon can be found by pairwise spatial joins between contours of adjacent contour levels (see code below).
    • if the contours are already encoded as polygon-hole structures, the tree can be directly inferred from that
  2. As long as there are still contour polygons that haven’t been assigned a prominence:
    1. select the contour polygon with the highest altitude that hasn’t been assigned a prominence yet
    2. traverse ‘down’wards through its enclosing polygons, marking each of them as 0 prominence
    3. when encountering an enclosing polygon that is already marked with 0 prominence, then that contour contains the key col => mark the prominence of the contour polygon selected in (1) as that polygon’s altitude minus the altitude of the col contour

Python implementation

import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = None
import geopandas as gpd

# pygeos sjoin gives widely inaccurate results, so force using rtree instead
#import pygeos
gpd.options.use_pygeos = False

from geopandas import GeoSeries
from import sjoin
from shapely.geometry import Polygon

ct = gpd.read_file('~/theview/contour-10.shp')
ct['poly'] = GeoSeries([Polygon(g) for g in ct.geometry])
ct['area'] = ct.poly.area

# sorting by area as secondary is important for two reasons:
# 1. assume larger areas of the same elevation are actually a little higher, so are assigned prominence first
# 2. when selecting basin containers we want to stick to the smallest containing one,
# (so we aggregate using .max(), maybe .last() also ok)
ct.sort_values(['elevation', 'area'], ascending=[False, False], inplace=True, na_position='first')
ct.reset_index(drop=True, inplace=True)

# now that contour lines have been re-sorted, make dedicated polygon dataframe for spatial joins
polys = gpd.GeoDataFrame(geometry=ct.poly)
polys = polys.set_crs(epsg=4326, inplace=True)
# and put in an appropriate projection
polys.to_crs('ESRI:102030', inplace=True)

centroids = gpd.GeoDataFrame(geometry=polys.geometry.centroid)

els = list(ct.elevation.unique())
interval = els[0] - els[1]

# index of the 'parent' (containing) contour line
ct['parent'] = np.repeat(-1, len(ct))

for ele in els:
  left = sum(ct.elevation == ele)
  descentparents = sjoin(centroids[ct.elevation == ele], polys[ct.elevation == ele - interval], op='within', how='left') # or within or intersects
  # if there's no direct descent parent, maybe it is a basin within a
  # (usually much wider) contour line of the same elevation
  orphans = descentparents[np.isnan(descentparents.index_right)].index
  basins = 0
  if len(orphans):
    # TODO only keep polygons larger than the smallest orphan as candidates
#    largestorphan = max(ct.area)
    basinparents = sjoin(centroids.loc[orphans], polys[ct.elevation == ele], op='within', how='left')
    # FIXME drop multiple matches! also group by but select largest (=first/min)
    basinparents.dropna(inplace = True)
    basins = sum(basinparents.index != basinparents.index_right)
  # if there's multiple, only keep smallest (because that will be the direct parent)
  groups = descentparents[['index_right']].groupby(descentparents.index, sort=False)
  # because we originally sorted like this, we know it's the highest index! yesss!
  descentparents = groups.max()

  # using pygeos' sjoin I found that I got many incorrect results, basd on this check:
  # p.geometry.within(polys.geometry[p.index_right])]
#  invalid = [i for i, p in descentparents.iterrows() if np.isnan(p.index_right)]
#  print(invalid)
#  descentparents.drop(invalid, inplace=True)
  # actual matches found
  print(len(descentparents) + basins)
  ct.parent[descentparents.index] = descentparents.index_right
  # now that known parents have been assigned, we can use this information to correctly
  # assign the lower parent of basins
  if basins:
    print("found basin(s)")
    # FIXME there might be recursive basin-parenting happening here which would require a loop
    ct.parent[basinparents.index] = ct.parent[basinparents.index_right]
#    print(ct.parent[basinparents.index])

ct['prominence'] = np.repeat(None, len(ct))
for i in range(len(ct)):
  if ct.prominence[i] != None or ct.parent[i] == -1:
#  print(i)
  parent = ct.parent[i]
  print(f'{i} @ {ct.elevation[i]}: ', end = '')
  while ct.prominence[parent] == None:
    print('.', end = '')
    ct.prominence[parent] = 0
    if ct.parent[parent] == -1 or np.isnan(ct.parent[parent]):
    parent = ct.parent[parent]
  # ok, compute prominence
  ct.prominence[i] = ct.elevation[i] - ct.elevation[parent]
  print(f'{parent} @ {ct.elevation[parent]}')

peaks = [i for i, p in enumerate(ct.prominence > 0) if p]
peaks = gpd.GeoDataFrame({ 'elevation': ct.elevation[peaks], 'prominence': pd.to_numeric(ct.prominence[peaks]), 'geometry': polys.geometry[peaks] }, crs='ESRI:102030')

# TODO filter only positive prominence