Introduction to pysheds#

In this tutorial, we’re going to cover a new Python package one can use to determine the catchment area and resource of a potential run-of-river power plant:

pysheds: Simple and fast watershed delineation in python

Pysheds is an open-source library designed to help with processing of digital elevation models (DEMs), particularly for hydrologic analysis. Pysheds performs many of the basic hydrologic functions, including catchment delineation and accumulation computation.

Note

Documentation for pysheds is available at http://mattbartos.com/pysheds/.

Note

If you have not yet set up Python on your computer, you can execute this tutorial in your browser via Google Colab. Click on the rocket in the top right corner and launch “Colab”. If that doesn’t work download the .ipynb file and import it in Google Colab.

Then install the following packages by executing the following command in a Jupyter cell at the top of the notebook.

!pip install pysheds matplotlib

Note

The following tutorial is based on the following resources:

Data Downloads#

Before we can run this tutorial, we have to download a few input data files to your local filesystem. The cell below uses some Python utility packages to automate the retrieval of the required files.

from urllib.request import urlretrieve
from os.path import basename

url = "https://tubcloud.tu-berlin.de/s/ZPiZjX3Nzc6R6RX/download/output_COP30.tif"
urlretrieve(url, basename(url))
('output_COP30.tif', <http.client.HTTPMessage at 0x7f0644a6f350>)

Imports#

First, let’s bring in the Grid class from the pysheds library, which will be our primary tool for managing and analyzing grid-based spatial data like elevation models in our hydrological computations…

from pysheds.grid import Grid
Exception ignored in: <bound method IPythonKernel._clean_thread_parent_frames of <ipykernel.ipkernel.IPythonKernel object at 0x7f064e14dc90>>
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 775, in _clean_thread_parent_frames
    def _clean_thread_parent_frames(

KeyboardInterrupt: 

… alongside a few additional Python packages you already know from previous tutorials.

import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import numpy as np
import cartopy.crs as ccrs
import cartopy.feature as cfeature

Read Digital Elevation Model (DEM)#

A Digital Elevation Model (DEM) is a 3D representation of a terrain’s surface created from terrain elevation data, and it’s the main dataset we’ll read into our analysis to model the watershed’s topography.

For this example, data for a small part of Northern Portugal is taken from the Copernicus Global Digital Elevation Model, which offers global coverage at a resolution of 30 metres.

European Space Agency, Sinergise (2021). Copernicus Global Digital Elevation Model. Distributed by OpenTopography. https://doi.org/10.5069/G9028PQB.

fn = "output_COP30.tif"
grid = Grid.from_raster(fn, nodata=0)
dem = grid.read_raster(fn, nodata=0)
fig = plt.figure()
ax = plt.axes(projection=ccrs.PlateCarree())
plt.imshow(dem, extent=grid.extent, cmap="terrain")
plt.colorbar(label="Elevation (m)", orientation="horizontal")
ax.coastlines()
ax.add_feature(cfeature.BORDERS)

Conditioning the elevation data#

Raw DEMs often contain artifacts (such as pits, depressions and flat regions) that prevent the DEM from fully draining.

  • Pits and Depressions consist of cells or larger regions of cells for which every surrounding cell is at a higher elevation.

  • Flats consist of cells at which every surrounding cell is at the same elevation or higher.

The presence of depressions and flats means that water will aggregate there and will not move downwards which would be a problem.

These artifacts need to be removed before the datasets can be used. By applying the following code, the affected pixels will get the values of their surrounding ones.

pit_filled_dem = grid.fill_pits(dem)
flooded_dem = grid.fill_depressions(pit_filled_dem)
conditioned_dem = grid.resolve_flats(flooded_dem)

After conditioning the data by filling depressions and resolving flats, the flow direction can be determined.

Computing flow directions#

Flow directions are computed to capture the topology of the drainage network, and are needed for delineating catchment areas and computing the accumulation of flow. By default, pysheds will compute flow directions using the D8 routing scheme (ESRI). In this routing mode, each cell is routed to one of eight neighboring cells based on the direction of steepest descent:

  • North: 64

  • Northeast: 128

  • East: 1

  • Southeast: 2

  • South: 4

  • Southwest: 8

  • West: 16

  • Northwest: 32

flowdir = grid.flowdir(conditioned_dem)
np.unique(flowdir)
plt.imshow(flowdir, extent=grid.extent, cmap="cividis")

Computing flow accumulation#

The accumulated flow is the number of upstream cells that contribute flow to each cell in the DEM grid. The result of this calculation is a grid where each cell’s value represents the count of all cells that flow into it, directly or indirectly. In essence, it gives you an idea of the potential water volume that would pass through each point if each cell contributed a unit amount of water. The flow accumulation process takes the flow direction grid as input and works by tracking the flow from each cell to its downslope neighbors. It aggregates the number of cells flowing into each downslope cell, essentially “accumulating” the flow as it moves through the drainage network. Hence, as you move downstream, the number increases, and the other way around for upstream.

acc = grid.accumulation(flowdir)
fig, ax = plt.subplots()
im = ax.imshow(
    acc,
    extent=grid.extent,
    cmap="cubehelix",
    norm=LogNorm(1, acc.max()),
    interpolation="bilinear",
)
plt.colorbar(im, ax=ax, label="Upstream Cells", orientation="horizontal")

Computing catchment areas#

The method to calculate the catchment area also operates on the flow direction grid.

To delineate a catchment, first specify a pour point (the outlet of the catchment area, e.g. the weir of a run-of-river power plant).

However, sometimes the pour point is not known exactly. In this case, it can be helpful to first compute the accumulated flow and then snap a roughly estimated pour point to the nearest high accumulation cell.

For instance, we may specify a trial pour point…

x, y = -7.954, 41.47

… then snap the pour point to a high accumulation cell with at least 5000 upstream cells:

x_snap, y_snap = grid.snap_to_mask(acc > 5000, (x, y))

This point, we can then use to delineate the catchment area:

catch = grid.catchment(x=x_snap, y=y_snap, fdir=flowdir, xytype="coordinate")

Note

  • If the x and y components of the pour point are spatial coordinates in the grid’s spatial reference system, specify xytype='coordinate'.

  • If the x and y components of the pour point correspond to the row and column indices of the flow direction array, specify xytype='index'.

We can then clip our grid to the catchment area and plot the result:

grid.clip_to(catch)
clipped_catch = grid.view(catch)
clipped_accumulated_flow = grid.accumulation(flowdir)
fig, ax = plt.subplots()
im = ax.imshow(
    np.where(clipped_catch, clipped_accumulated_flow, np.nan),
    extent=grid.extent,
    cmap="cubehelix",
    norm=LogNorm(1, clipped_accumulated_flow.max()),
    interpolation="bilinear",
)
plt.colorbar(
    im,
    ax=ax,
    label="Upstream Cells",
)
plt.scatter([x_snap], [y_snap], c="r", s=50)

Where to go from here?#

Explore datasets of global hydro basins and what they are used for:

https://www.hydrosheds.org/

The HydroSHEDS database offers a suite of global digital data layers in support of hydro-ecological research and applications worldwide. Its various hydrographic data products include catchment boundaries, river networks, and lakes at multiple resolutions and scales. HydroSHEDS data are freely available in standard GIS formats and form the geospatial framework for a broad range of assessments including hydrological, environmental, conservation, socioeconomic, and human health applications.

https://www.hydrosheds.org/applications/ssp-hydropower

HydroSHEDS data have been used to assess configuration scenarios for future hydropower development at global, national (e.g., Zambia, Nepal) or basin-level (e.g., Marañón river) scales. This ‘system-scale planning’ (SSP) approach provides a large-scale perspective which considers the complex interactions and cumulative effects of different dam combinations rather than determining the impacts of each dam project individually. This type of holistic assessment takes the regional and downstream effects of the entire hydropower scenario into account and allows for strategic planning by comparing various national or international development options. It is particularly useful during the early phases of hydropower portfolio design and can facilitate the dialog between stakeholders and policymakers as a decision support tool.

Calculate water flow volume at pour point using meteorological data:

With the delineated catchment area from pysheds, you can utilize surface runoff data, which represents the part of precipitation and snowmelt that flows over the land without infiltrating the soil or evaporating, to estimate total flow at the basin’s pour point. By integrating surface runoff data over the entire catchment area, typically given in volume per area per time (e.g. millimetres per day), and summing it up for all the cells within the catchment, you can approximate the total water volume contributing to the flow at the pour point. This integration accounts for spatial variations in runoff generation across the catchment, providing you with an aggregated flow estimate that can be used to assess the hydrological output of the catchment at the specific location where the water exits the basin, for instance, the amount of water available for electricity generation in a run-of-river plant.

Exercises#

Determine a few additional catchment areas by varying the pour point and the threshold for the accumulated flow.