Introduction to rasterio and atlite

Introduction to `rasterio` and `atlite`#

In this tutorial, we’re going to cover two new Python packages:

Rasterio: access to geospatial raster data

Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.

Note

Documentation for rasterio is available at https://rasterio.readthedocs.io/en/latest/index.html.

Atlite: Convert weather data to energy systems data

Atlite is a free software, xarray-based Python library for converting weather data (such as wind speeds, solar radiation, temperature and runoff) into power systems data (such as wind power, solar power, hydro power and heating demand time series).

Note

Documentation for atlite is available at https://atlite.readthedocs.io/en/latest/.

This tool is developed here at TU Berlin in the group of Digital Transformation of Energy Systems.

Note

If you have not yet set up Python on your computer, you can execute this tutorial in your browser via Google Colab. Click on the rocket in the top right corner and launch “Colab”. If that doesn’t work download the .ipynb file and import it in Google Colab.

Then install the following packages by executing the following command in a Jupyter cell at the top of the notebook.

!pip install atlite rasterio geopandas cartopy matplotlib numpy xarray

Download of Required Data#

Before we can run this tutorial, we have to download a few input data files to your local filesystem. The cell below uses some Python utility packages to automate the retrieval of the required files.

Repetition of Vector Data#

In the last tutorial, we used geopandas to work with vector data (e.g. points, lines, polygons):

Importing the package:

import geopandas as gpd

Reading the file:

url = "https://tubcloud.tu-berlin.de/s/7bpHrAkjMT3ADSr/download/country_shapes.geojson"

countries = gpd.read_file(url).set_index("name")

Inspecting the GeoDataFrame:

countries.tail()

	geometry
name
VN	MULTIPOLYGON (((107.49807 14.74595, 107.50928 ...
YE	MULTIPOLYGON (((52.09954 18.72304, 52.14233 18...
ZA	MULTIPOLYGON (((19.99292 -24.77585, 20.01049 -...
ZM	POLYGON ((32.92086 -9.4079, 32.92303 -9.46629,...
ZW	POLYGON ((25.25978 -17.79411, 25.26671 -17.800...

Note

This dataset uses ISO-2 country codes as index: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2.

Plotting the GeoDataFrame:

countries.plot(edgecolor="k", facecolor="lightgrey")

<Axes: >

_images/23ee44a19cce52758018bb4ae6c49e2467408f373953c97970f5e7621a82f0cd.png

Reprojecting geometries in the GeoDataFrame:

import matplotlib.pyplot as plt
import cartopy.crs as ccrs

crs = ccrs.EqualEarth()

fig = plt.figure(figsize=(10, 5))

ax = plt.axes(projection=crs)

countries.to_crs(crs.proj4_init).plot(ax=ax, edgecolor="k", facecolor="lightgrey")

<GeoAxes: >

_images/5fe85096bc550582ce196287172f1211c17e265ed6b760faed8dc861d32450ad.png

Raster Data with `rasterio`#

But we haven’t worked with raster data yet!

One useful package for working with raster data is rasterio:

import rasterio

With rasterio, we can for instance load GeoTIFF files.

In the example below, we load the CORINE land cover classification for Portugal:

url = "https://tubcloud.tu-berlin.de/s/Mw5dwbwdsDY8zfH/download/U2018_CLC2018_V2020_20u1-PT.tif"
clc = rasterio.open(url)

Datasets can have one or more bands (or layers), which can be read with the .read() function:

band = clc.read(1)

The returned object is a 2-dimensional numpy.ndarray.

type(band)

numpy.ndarray

band.shape

(5662, 3404)

The band values correspond to a land cover and land use classification scheme with class values as listed here.

This is an example of categorical values in raster data.

import numpy as np

np.unique(band)

array([-128,    1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
         11,   12,   13,   14,   15,   16,   17,   18,   19,   20,   21,
         22,   23,   24,   25,   26,   27,   28,   29,   30,   31,   32,
         33,   35,   37,   38,   39,   40,   41,   42,   43,   44],
      dtype=int8)

There are two common options to plot raster data:

with plt.imshow()
with rasterio.plot.show()

plt.imshow(band, cmap="tab20")

<matplotlib.image.AxesImage at 0x7f088646a0d0>

_images/5cebcb1fd2d16afe619c2972bf76d7261008bf8fc3ffb4e8fad301375c59ff40.png

from rasterio.plot import show

show(band, cmap="tab20")

<Axes: >

The integrated rasterio plotting functionality is slightly more powerful.

For instance, it can label the axes with proper geo-referenced extents.

These are stored in the .crs, .bounds and .transform attributes.

clc.crs

CRS.from_epsg(3035)

clc.bounds

BoundingBox(left=2635400.0, bottom=1732500.0, right=2975800.0, top=2298700.0)

clc.transform

Affine(100.0, 0.0, 2635400.0,
       0.0, -100.0, 2298700.0)

We can pass the information on the projection to the rasterio.plot.show() function:

from rasterio.plot import show

show(band, transform=clc.transform, cmap="tab20")

_images/19913a19ada59c6fbb6f911ea72af60a21afab7591bdbd6a904cb4566568d8e2.png

<Axes: >

Let’s quickly load another dataset with continuous values: The elevation and water depth in and around Portugal.

This dataset comes from GEBCO Gridded Bathymetry Data and has another file format .nc (netCDF) which also stores raster data.

It would be useful to exclude very mountainous areas, or identify shallow water at sea for the development of offshore wind turbines.

url = "https://tubcloud.tu-berlin.de/s/XoDpBcweJHmYKgF/download/GEBCO_2014_2D-PT.nc"
gebco = rasterio.open(url, driver="netcdf")

band = gebco.read(1)

np.unique(band)

array([-2898, -2862, -2856, ...,  1915,  1926,  1959], dtype=int16)

For plotting, we can also display vector and raster data together:

fig, ax = plt.subplots(figsize=(4, 8))

countries.loc[["PT", "ES"]].plot(ax=ax, color="none")

show(band, transform=gebco.transform, cmap="RdBu_r", ax=ax)

<Axes: >

_images/407c56a778873bae946eb8c562e8f46375addf7e78323e8052b252e702f4360c.png

Land Eligibility Analysis with `atlite`#

To estimate the wind and solar generation potential of a country, land usage constraints need to be taken into account.

Data required to determine the eligible areas is stored as raster and vector data.

Examples:

Vector Data: natural protection areas, settlements, roads, airports
Raster Data: type of land, elevation, slope

The atlite library has some useful features to handle exclusion constraints for both datatypes.

The core object to do land elibility analysis in atlite is the ExclusionContainer, which serves as a container for all rasters and geometries we want to exclude (or include).

from atlite.gis import ExclusionContainer

In many cases, rasters and geometries have different projections and resolutions. Therefore, the ExclusionContainer is initialized by a CRS and a resolution which suits as a basis for all added rasters and geometries. Per default the CRS is 3035 and the resoultion 100, which leads set a raster of 100 meter resolution. All rasters and geometries will be converted to this CRS and resolution.

Warning

The CRS 3035 is only valid in Europe.

excluder = ExclusionContainer(crs=3035)

Let’s add our first exclusion criteria, the Natura2000 natural protection areas, to the container using the excluder.add_geometry() function. This function is for vector data. It’s sibling, excluder.add_raster() we will use later.

url = "https://tubcloud.tu-berlin.de/s/mxgaA7wH8NsmdDi/download/Natura2000_end2021-PT.gpkg"
excluder.add_geometry(url)

To calculate the mask for the exclusion container (i.e. which cells are not excluded), we can use the shape_availability() function, which takes as arguments:

a geometry (e.g. country shape) within which to calculate available areas
an exclusion container object

from atlite.gis import shape_availability

Since the excluder is in crs=3035, we should also convert to geometry of Portucal to excluder.crs first.

shape = countries.to_crs(excluder.crs).loc[["PT"]].geometry
shape.iloc[0]

_images/f663ca6317d7528b0c3b665c814d5e0a04d80f45d5eafb1873316d970645c510.svg

We can then call the shape_availability function:

band, transform = shape_availability(shape, excluder)

/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/pyogrio/raw.py:198: RuntimeWarning: File /vsimem/pyogrio_7ee0c74b54d941998dde913a6f2e6a4b has GPKG application_id, but non conformant file extension
  return ogr_read(

This function returns a 2-D numpy.ndarray and a transformation, similar to the previous rasterio examples.

Eligile raster cells have a True and excluded cells a False.

band

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

band.any()

True

transform

Affine(100.0, 0.0, 2635400.0,
       0.0, -100.0, 2298700.0)

We can use the rasterio.plot.show() function for plotting:

fig, ax = plt.subplots(figsize=(4, 8))
shape.plot(ax=ax, color="none")
show(band, transform=transform, cmap="Greens", ax=ax)

<Axes: >

_images/86f055034acc132f9caa7945ed8331215a5a8f45787ad8f6ff62f8cb49c38c37.png

How to calculate what share of the land is eligible?

count number of eligible raster cells
determine area per raster cell
multiply cell area with number of cells
divide eligible area by total land area

eligible_cells = band.sum()

cell_area = excluder.res**2

eligible_area = cell_area * eligible_cells

country_area = shape.geometry.area.iloc[0]

eligible_area / country_area * 100

79.65002907506357

In this example, approximately 20% of Portugal’s land area are covered by nature reserves.

Typically, more than one dataset is needed for a proper land eligibility analysis, i.e. we need to add more datasets to our container. So, let’s expand the exclusion container …

url = "https://tubcloud.tu-berlin.de/s/mxgaA7wH8NsmdDi/download/Natura2000_end2021-PT.gpkg"
excluder = ExclusionContainer()
excluder.add_geometry(url)

… by a selection of CORINE land cover classes, which is given as raster data:

When adding a raster to the ExclusionContainer you can specify which codes (integers) to exclude.

By setting invert=True, you can also restrict the inclusion to a set of codes.

Furthermore, you can add a buffer around codes which should also be excluded.

Here we are first going to include a set of suitable CORINE classes:

fn = "https://github.com/fneum/data-science-for-esm/raw/main/data-science-for-esm/U2018_CLC2018_V2020_20u1-PT.tif"

codes = [12, 13, 18, 19, 20, 21]

excluder.add_raster(fn, codes=codes, crs=3035, invert=True)

And then we are going exclude CORINE classes representing settled areas with an 800 meter buffer around them:

codes = [1, 2, 3, 4, 5, 6]

excluder.add_raster(fn, codes=codes, buffer=800, crs=3035)

Then, we can simply repeat the calculation and visualisation of available cells:

band, transform = shape_availability(shape, excluder)

fig, ax = plt.subplots(figsize=(4, 8))
shape.plot(ax=ax, color="none")
show(band, transform=transform, cmap="Greens", ax=ax)

/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/pyogrio/raw.py:198: RuntimeWarning: File /vsimem/pyogrio_21e48b8ecf76404d8f1f6e4e4cfcd101 has GPKG application_id, but non conformant file extension
  return ogr_read(

<Axes: >

_images/cd08524b7cd5b8588e513bcabac5deb025809dd9b1209a9f8b8b3cf893b7c2fb.png

Spatial and Temporal Availability#

Besides calculating spatial potential for renewables development, atlite can also handle weather data time series and convert them to energy systems time series (like production time series for wind and solar).

One famous such dataset is the ERA5 reanalysis data which has historical weather data for the whole world from 1959 to 2022 at hourly resolution and on a 0.25 x 0.25 degree grid (roughly 25-30 km depending on where you are).

atlite can be imported like this (usually without an alias):

import atlite

A cutout is the basis for any of your work and calculations and contains all the weather data.

Here, we load an existing dataset which was retrieved from ERA5.

cutout = atlite.Cutout("PORTUGAL-2013-01-era5.nc")

cutout

<Cutout "PORTUGAL-2013-01-era5">
 x = -9.90 ⟷ -6.00, dx = 0.30
 y = 36.00 ⟷ 42.90, dy = 0.30
 time = 2013-01-01 ⟷ 2013-02-01, dt = h
 module = era5
 prepared_features = ['height', 'wind', 'influx', 'temperature', 'runoff']

cutout.data

<xarray.Dataset> Size: 17MB
Dimensions:           (x: 14, y: 24, time: 768)
Coordinates:
  * x                 (x) float64 112B -9.9 -9.6 -9.3 -9.0 ... -6.6 -6.3 -6.0
  * y                 (y) float64 192B 36.0 36.3 36.6 36.9 ... 42.3 42.6 42.9
  * time              (time) datetime64[ns] 6kB 2013-01-01 ... 2013-02-01T23:...
    lon               (x) float64 112B dask.array<chunksize=(14,), meta=np.ndarray>
    lat               (y) float64 192B dask.array<chunksize=(24,), meta=np.ndarray>
Data variables: (12/13)
    height            (y, x) float32 1kB dask.array<chunksize=(24, 14), meta=np.ndarray>
    wnd100m           (time, y, x) float32 1MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    wnd_azimuth       (time, y, x) float32 1MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    roughness         (time, y, x) float32 1MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    influx_toa        (time, y, x) float32 1MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    influx_direct     (time, y, x) float32 1MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    ...                ...
    albedo            (time, y, x) float32 1MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    solar_altitude    (time, y, x) float64 2MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    solar_azimuth     (time, y, x) float64 2MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    temperature       (time, y, x) float64 2MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    soil temperature  (time, y, x) float64 2MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
    runoff            (time, y, x) float32 1MB dask.array<chunksize=(100, 24, 14), meta=np.ndarray>
Attributes:
    module:             era5
    prepared_features:  ['height', 'runoff', 'temperature', 'wind', 'influx']
    chunksize_time:     100
    dx:                 0.3
    dy:                 0.3
    Conventions:        CF-1.6
    history:            2022-03-04 14:08:37 GMT by grib_to_netcdf-2.24.2: /op...

Let’s quickly plot the spatial resolution and scope of this dataset:

ax = shape.to_crs(4326).plot()
cutout.grid.plot(ax=ax, edgecolor="grey", color="None")

<Axes: >

_images/87f1af82bbbc0f83e3e9f95a99568765cc5343958c23d407397a268e8e95eb50.png

We can directly access, aggregate, and plot individual meteorological parameters:

wnd100m = cutout.data.wnd100m.mean(dim="time")

ax = plt.axes(projection=ccrs.PlateCarree())

wnd100m.plot(ax=ax, vmin=0, vmax=10)

shape.to_crs(4326).plot(ax=ax, edgecolor="k", color="none")

<GeoAxes: xlabel='x', ylabel='y'>

_images/64314239d169d328177d98f7ef619090ced4fb76a44346ba1a06cf5d4e702e1c.png

This cutout happens to also have data on the surface roughness. We can use this to scale the 100 meter wind speeds down to 10 meters using the log law:

\[\frac{U(z)}{U(z_r)} = \frac{\ln\left(\frac{z}{z_0}\right)}{\ln\left(\frac{z_r}{z_0}\right)}\]

z_r = cutout.data.roughness
wnd100m = cutout.data.wnd100m

wnd10m = wnd100m * np.log(10 / z_r) / np.log(100 / z_r)

ax = plt.axes(projection=ccrs.PlateCarree())

wnd10m.mean(dim="time").plot(ax=ax, vmin=0, vmax=10)

shape.to_crs(4326).plot(ax=ax, edgecolor="k", color="none")

<GeoAxes: xlabel='x', ylabel='y'>

_images/e40a77d6580495a09133e7a482aa3cdb8a0cfff41d60a5b71f4602eff48c0813.png

We can also build means in the spatial dimensions and investigate the behaviour of meteorological variables over time. Here, we do this for the direct solar irradiation.

cutout.data.mean(dim=["x", "y"]).influx_direct.plot()

[<matplotlib.lines.Line2D at 0x7f0859719b50>]

_images/510bf84cdcacedeaf27f8c951c0f462a5c6411a4373e2b7d8cc04d248704c834.png

We also already saw that the weather cells are much larger than the raster cells. atlite automates this conversion for averaging fine-grained to coarse-grained rasters for all geometries in shapes when calling the cutout.availabilitymatrix() function.

A = cutout.availabilitymatrix(shape, excluder)

A is an array with 3 dimensions (shape, x, y) and very sparse data. It indicates the relative overlap of weather cell (x, y) with geometry shape while excluding the area specified by the excluder.

fig, ax = plt.subplots()
A.sel(name="PT").plot(cmap="Greens")
shape.to_crs(4326).plot(ax=ax, edgecolor="k", color="none")
cutout.grid.plot(ax=ax, color="none", edgecolor="grey", ls=":")

<Axes: title={'center': 'name = PT'}, xlabel='x', ylabel='y'>

_images/c303dff8afc9866fea943e2b0f8f81d341fa13d230b3203eb0aff38f690a7224.png

The availibility matrix A can now be used as a layout matrix in the conversion functions of atlite from wind speeds to capacity factors.

The general approach would be to weigh the availabilities with the area per grid cell and the allowed deployment density (capacity per area).

Note

We won’t cover xarray in detail in this course, but for now you can imagine it as a performant multi-dimensional version of pandas.

import xarray as xr

cap_per_sqkm = 2  # 2 MW/km^2
area = cutout.grid.set_index(["y", "x"]).to_crs(3035).area / 1e6  # in km^2
area = xr.DataArray(area, dims=("spatial"))

capacity_matrix = A.stack(spatial=["y", "x"]) * area * cap_per_sqkm

This code block processes the three-dimensional availability matrix A (representing a spatial grid with values between 0 and 1 indicating the relative eligible area per square) to calculate the capacity per square kilometer for each grid cell. First, the cutout.grid is assigned its latitudes and longitudes as coordinates, before the area of each grid cell is calculated. This new area object is converted into an xarray.DataArray with a single spatial dimension called “spatial”. Finally, the availability matrix A is flattened (stacked) into a two-dimensional array along the spatial dimensions, x and y, aligning it with the dimensions of area. Each cell’s value is multiplied by its corresponding area and a capacity factor (cap_per_sqkm).

Using the resulting capacity_matrix which contains for each grid cell the maximum deployable capacity, we can now calculate the capacity factor time series for Portugal.

wind = cutout.wind(matrix=capacity_matrix, turbine="Vestas_V90_3MW", index=shape.index)

/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/atlite/resource.py:72: FutureWarning: 'add_cutout_windspeed' for wind turbine
power curves will default to True in atlite relase v0.2.15.
  warnings.warn(msg, FutureWarning)

Let’s see how the total wind power potential looks like:

wind.plot()

[<matplotlib.lines.Line2D at 0x7f0851605510>]

_images/c8c560bdd97de3e1c9c03a6eae56f3fdd97d7d55b779aa4bc6af8b51020cb1f2.png

Analogously, we can also determine the potential power generation time series of solar photovoltaics parks:

Note

For simplicity, we assume the same deployment density and land eligibility criteria as for wind. Note that in reality, different eligibility criteria apply for solar and wind developments.

For that, we have to specify the type of panel (CSi refers to a crystalline silicon panel), the orientation (azimuth of 180° means southward orientation, slope refers to inclination angle from the ground).

pv = cutout.pv(
    panel="CSi",
    orientation={"slope": 30.0, "azimuth": 180.0},
    matrix=capacity_matrix,
    index=shape.index,
)

pv.plot()

[<matplotlib.lines.Line2D at 0x7f08510c69d0>]

_images/ee0b1ebb5f9bbf056829936fe839cc8bd49f5ea7de6d43f7e0605aae1bf56c00.png

Exercises#

Exercise 1#

Tweak some parameters of the land exclusion calculation for wind deployment:

Vary the CORINE land classes allowed for wind turbine development.
Vary the buffer distance from 0m to 2000m in steps of 250m and observe the change in availble area.
Add the GEBCO topography data loaded at the beginning of the tutorial to the exclusion calculator and exclude areas above 1000m elevation.

Solution:

Change codes in excluder.add_raster(fn, codes=codes, crs=3035, invert=True)
Change buffer in excluder.add_raster(fn, codes=codes, buffer=800, crs=3035)
Add line excluder.add_raster(fn_to_gebco, crs=4326) as data is given in different projection.

Exercise 2#

The orientation of solar panels plays a decisive role in determining their potential power generation time series. Explore this aspect by performing the following sensitivity analysis:

Plot the total energy yield in January as a function of the slope of the panels oriented towards the South (0° means horizontal orientation, 90° means vertical orientation) in steps of 10°. What is the optimal tilt angle? Why might the value be higher than typically recommended for PV panels?
Plot the power production profile for January 1 for the azimuth angles 0°, 90°, 180°, 270° of panels tilted at a 30° angle. What orientation does each angle represent? Describe the differences you observe.
The cutout.pv() function can also consider multiple options for tracking features of solar panels as listed below. Tracking in solar panel systems refers to the mechanism that allows panels to move and orient themselves towards the sun, maximizing their exposure to sunlight and thereby increasing their electricity output. The default settings in atlite assume no tracking devices. Explore and describe the power output profile for different tracking options for January 31. How much more expensive in percent can the equipment for tracking be to achieve the same cost per unit of electricity produced as a fixed panel (if it only operated in January)?

Available tracking options in atlite are:

tracking='horizontal' for 1-axis horizontal tracking: This setting positions solar panels to rotate around a horizontal axis, allowing them to follow the sun’s path from east to west.
tracking='vertical' for 1-axis vertical tracking: In this mode, the solar panels rotate around a vertical axis, which can be advantageous in high latitudes for better sun capture throughout the day.
tracking='dual' for 2-axis tracking: This advanced system allows panels to track the sun both horizontally and vertically, maximizing solar exposure and energy capture throughout the day and year.

For all calculations, assume the same deployment density and land eligibility criteria as for wind (for simplicity).

0° is North (minimum output as sun traverses from East to West via South in Northern hemisphere)
90° is East (sun reaches panel earlier, departs earlier)
270° is West (sun reaches panel later, departs later)
180° is South (maximum output)

Introduction to rasterio and atlite

Contents

Introduction to rasterio and atlite#

Download of Required Data#

Repetition of Vector Data#

Raster Data with rasterio#

Land Eligibility Analysis with atlite#

Spatial and Temporal Availability#

Exercises#

Exercise 1#

Exercise 2#

Introduction to `rasterio` and `atlite`#

Raster Data with `rasterio`#

Land Eligibility Analysis with `atlite`#