Preparation for Group Assignment#

This tutorial contains various small guides for tasks you will need or come in handy in the upcoming group assignment.

We’re going to need a couple of packages for this tutorial:

from atlite.gis import ExclusionContainer
from atlite.gis import shape_availability
import atlite
import rasterio as rio
from rasterio.plot import show
import geopandas as gpd
import pandas as pd
import numpy as np
import xarray as xr
import os
import matplotlib.pyplot as plt
import country_converter as coco
import atlite

Preparatory Downloads#

For this tutorial, we also need to download a few files, for which one can use the urllib library:

Hide code cell content
from urllib.request import urlretrieve
Hide code cell content
fn = "era5-2013-NL.nc"
url = "https://tubcloud.tu-berlin.de/s/bAJj9xmN5ZLZQZJ/download/" + fn
urlretrieve(url, fn)
('era5-2013-NL.nc', <http.client.HTTPMessage at 0x7f1bb62dddd0>)
Hide code cell content
fn = "PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326-NL.tif"
url = f"https://tubcloud.tu-berlin.de/s/567ckizz2Y6RLQq/download?path=%2Fcopernicus-glc&files={fn}"
urlretrieve(url, fn)
('PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326-NL.tif',
 <http.client.HTTPMessage at 0x7f1bb5d38d50>)
Hide code cell content
fn = "WDPA_Oct2022_Public_shp-NLD.tif"
url = (
    f"https://tubcloud.tu-berlin.de/s/567ckizz2Y6RLQq/download?path=%2Fwdpa&files={fn}"
)
urlretrieve(url, fn)
('WDPA_Oct2022_Public_shp-NLD.tif',
 <http.client.HTTPMessage at 0x7f1bb5d3aa10>)
Hide code cell content
fn = "GEBCO_2014_2D-NL.nc"
url = (
    f"https://tubcloud.tu-berlin.de/s/567ckizz2Y6RLQq/download?path=%2Fgebco&files={fn}"
)
urlretrieve(url, fn)
('GEBCO_2014_2D-NL.nc', <http.client.HTTPMessage at 0x7f1bb5d40550>)

Downloading historical weather data from ERA5 with atlite#

First, let’s load some small example country. Let’s say, the Netherlands.

fn = "https://tubcloud.tu-berlin.de/s/567ckizz2Y6RLQq/download?path=%2Fgadm&files=gadm_410-levels-ADM_1-NLD.gpkg"
regions = gpd.read_file(fn)
regions.plot()
<Axes: >
_images/15b660ee6631c52dd607ecd855b5493e46764947af41bc4c6077beb278729857.png

In this example we download historical weather data ERA5 data on-demand for a cutout we want to create.

Note

For this example to work, you should have

  • installed the Copernicus Climate Data Store cdsapi package (conda list cdsapi or pip install cdsapi) and

  • registered and setup your CDS API key as described on this website. Note that there are different instructions depending on your operating system.

A cutout is the basis for any of your work and calculations in atlite.

The cutout is created in the directory and file specified by the relative path. If a cutout at the given location already exists, then this command will simply load the cutout again. If the cutout does not yet exist, it will specify the new cutout to be created.

For creating the cutout, you need to specify the dataset (e.g. ERA5), a time period and the spatial extent (in latitude and longitude).

minx, miny, maxx, maxy = regions.total_bounds
buffer = 0.25
cutout = atlite.Cutout(
    path="era5-2013-NL.nc",
    module="era5",
    x=slice(minx - buffer, maxx + buffer),
    y=slice(miny - buffer, maxy + buffer),
    time="2013",
)
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/atlite/cutout.py:190: UserWarning: Arguments module, x, y, time are ignored, since cutout is already built.
  warn(

Calling the function cutout.prepare() initiates the download and processing of the weather data. Because the download needs to be provided by the CDS servers, this might take a while depending on the amount of data requested.

Note

You can check the status of your request here.

cutout.prepare()
<Cutout "era5-2013-NL">
 x = 3.25 ⟷ 7.25, dx = 0.25
 y = 50.50 ⟷ 53.75, dy = 0.25
 time = 2013-01-01 ⟷ 2013-12-31, dt = h
 module = era5
 prepared_features = ['height', 'wind', 'influx', 'temperature', 'runoff']

The data is accessible in cutout.data. Included weather variables are listed in cutout.prepared_features. Querying the cutout gives us some basic information on which data is contained in it.

cutout.data
<xarray.Dataset> Size: 134MB
Dimensions:           (x: 17, y: 14, time: 8760)
Coordinates:
  * x                 (x) float64 136B 3.25 3.5 3.75 4.0 ... 6.5 6.75 7.0 7.25
  * y                 (y) float64 112B 50.5 50.75 51.0 ... 53.25 53.5 53.75
  * time              (time) datetime64[ns] 70kB 2013-01-01 ... 2013-12-31T23...
    lon               (x) float64 136B dask.array<chunksize=(17,), meta=np.ndarray>
    lat               (y) float64 112B dask.array<chunksize=(14,), meta=np.ndarray>
Data variables: (12/13)
    height            (y, x) float32 952B dask.array<chunksize=(14, 17), meta=np.ndarray>
    wnd100m           (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    wnd_azimuth       (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    roughness         (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    influx_toa        (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    influx_direct     (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    ...                ...
    albedo            (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    solar_altitude    (time, y, x) float64 17MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    solar_azimuth     (time, y, x) float64 17MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    temperature       (time, y, x) float64 17MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    soil temperature  (time, y, x) float64 17MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    runoff            (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
Attributes:
    module:             era5
    prepared_features:  ['influx', 'wind', 'height', 'temperature', 'runoff']
    chunksize_time:     100
    Conventions:        CF-1.6
    history:            2023-01-15 21:33:09 GMT by grib_to_netcdf-2.25.1: /op...
cutout.prepared_features
module  feature    
era5    height                   height
        wind                    wnd100m
        wind                wnd_azimuth
        wind                  roughness
        influx               influx_toa
        influx            influx_direct
        influx           influx_diffuse
        influx                   albedo
        influx           solar_altitude
        influx            solar_azimuth
        temperature         temperature
        temperature    soil temperature
        runoff                   runoff
dtype: object
cutout
<Cutout "era5-2013-NL">
 x = 3.25 ⟷ 7.25, dx = 0.25
 y = 50.50 ⟷ 53.75, dy = 0.25
 time = 2013-01-01 ⟷ 2013-12-31, dt = h
 module = era5
 prepared_features = ['height', 'wind', 'influx', 'temperature', 'runoff']

Repetition: From Land Eligibility Analysis to Availability Matrix#

We’re going to use the plotting functions from previous exercises:

def plot_area(masked, transform, shape):
    fig, ax = plt.subplots(figsize=(5, 5))
    ax = show(masked, transform=transform, cmap="Greens", vmin=0, ax=ax)
    shape.plot(ax=ax, edgecolor="k", color="None", linewidth=1)

First, we collect all exclusion and inclusion criteria in an ExclusionContainer.

excluder = ExclusionContainer(crs=3035, res=300)
fn = "PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326-NL.tif"
excluder.add_raster(fn, codes=[50], buffer=1000, crs=4326)
excluder.add_raster(fn, codes=[20, 30, 40, 60], crs=4326, invert=True)
fn = "WDPA_Oct2022_Public_shp-NLD.tif"
excluder.add_raster(fn, crs=3035)
fn = "GEBCO_2014_2D-NL.nc"
excluder.add_raster(fn, codes=lambda x: x > 10, crs=4326, invert=True)

Then, we can calculate the available areas…

masked, transform = shape_availability(regions.to_crs(3035).geometry, excluder)

… and plot it:

plot_area(masked, transform, regions.to_crs(3035).geometry)
_images/d9cd773098b8c4fcc7e84717b606b844f6dcc2fff7dc6b622690d519453365fb.png

Availability Matrix#

regions.index = regions.GID_1
cutout = atlite.Cutout("era5-2013-NL.nc")
?cutout.availabilitymatrix
A = cutout.availabilitymatrix(regions, excluder)
Compute availability matrix:   0%|          | 0/14 [00:00<?, ? gridcells/s]
Compute availability matrix:  14%|█▍        | 2/14 [00:00<00:00, 12.71 gridcells/s]
Compute availability matrix:  29%|██▊       | 4/14 [00:00<00:01,  7.57 gridcells/s]
Compute availability matrix:  36%|███▌      | 5/14 [00:00<00:01,  8.15 gridcells/s]
Compute availability matrix:  50%|█████     | 7/14 [00:00<00:00,  9.29 gridcells/s]
Compute availability matrix:  64%|██████▍   | 9/14 [00:00<00:00,  9.27 gridcells/s]
Compute availability matrix:  79%|███████▊  | 11/14 [00:01<00:00, 10.17 gridcells/s]
Compute availability matrix:  93%|█████████▎| 13/14 [00:01<00:00, 10.08 gridcells/s]
Compute availability matrix: 100%|██████████| 14/14 [00:01<00:00,  9.57 gridcells/s]

cap_per_sqkm = 1.7  # MW/km2

area = cutout.grid.set_index(["y", "x"]).to_crs(3035).area / 1e6

area = xr.DataArray(area, dims=("spatial"))

capacity_matrix = A.stack(spatial=["y", "x"]) * area * cap_per_sqkm

Solar PV Profiles#

pv = cutout.pv(
    panel=atlite.solarpanels.CdTe,
    matrix=capacity_matrix,
    orientation="latitude_optimal",
    index=regions.index,
    per_unit=True,
)
pv.to_pandas().head()
GID_1 NLD.1_1 NLD.2_1 NLD.3_1 NLD.4_1 NLD.5_1 NLD.6_1 NLD.7_1 NLD.8_1 NLD.9_1 NLD.10_1 NLD.11_1 NLD.12_1 NLD.13_1 NLD.14_1
time
2013-01-01 00:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2013-01-01 01:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2013-01-01 02:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2013-01-01 03:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2013-01-01 04:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pv.to_pandas().iloc[:, 0].plot()
<Axes: xlabel='time'>
_images/b8478deb64bc27628f9a4c46ef573d4415723078612354daa2d0fea50f109b93.png

Wind Profiles#

wind = cutout.wind(
    atlite.windturbines.Vestas_V112_3MW,
    matrix=capacity_matrix,
    index=regions.index,
    per_unit=True,
)
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/atlite/resource.py:72: FutureWarning: 'add_cutout_windspeed' for wind turbine
power curves will default to True in atlite relase v0.2.13.
  warnings.warn(msg, FutureWarning)
wind.to_pandas().head()
GID_1 NLD.1_1 NLD.2_1 NLD.3_1 NLD.4_1 NLD.5_1 NLD.6_1 NLD.7_1 NLD.8_1 NLD.9_1 NLD.10_1 NLD.11_1 NLD.12_1 NLD.13_1 NLD.14_1
time
2013-01-01 00:00:00 0.999056 0.998485 0.998294 0.997185 0.998198 0.998937 0.997705 0.998457 0.996653 0.998057 0.996010 0.998684 0.997938 0.995548
2013-01-01 01:00:00 0.991667 0.993051 0.970360 0.995386 0.953421 0.994112 0.997151 0.997635 0.984541 0.997531 0.985574 0.996537 0.992089 0.983092
2013-01-01 02:00:00 0.981956 0.970941 0.933896 0.982949 0.903412 0.974296 0.995252 0.993278 0.939996 0.989856 0.956938 0.968682 0.976204 0.952877
2013-01-01 03:00:00 0.948241 0.870216 0.846933 0.946690 0.840527 0.877124 0.980648 0.951256 0.773720 0.957361 0.852354 0.853057 0.899308 0.837002
2013-01-01 04:00:00 0.770508 0.647249 0.572062 0.846147 0.593890 0.611827 0.921734 0.850735 0.490069 0.841523 0.681832 0.705750 0.730653 0.623015
wind.to_pandas().iloc[:, 0].plot()
<Axes: xlabel='time'>
_images/d781dacfe2287a8faa148e8815a1f65ea7c7507a9f16a67817501169850f59c9.png

Merging Shapes in geopandas#

Spatial data is often more granular than we need. For example, we might have data on sub-national units, but we’re actually interested in studying patterns at the level of countries.

Whereas in pandas we would use the groupby() function to aggregate entries, in geopandas, we aggregate geometric features using the dissolve() function.

Suppose we are interested in studying continents, but we only have country-level data like the country dataset included in geopandas. We can easily convert this to a continent-level dataset.

world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
/tmp/ipykernel_2355/3912264495.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
  world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world.head(3)
pop_est continent name iso_a3 gdp_md_est geometry
0 889953.0 Oceania Fiji FJI 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 58005463.0 Africa Tanzania TZA 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253.0 Africa W. Sahara ESH 907 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
continents = world.dissolve(by="continent").geometry

continents.plot();
_images/ca785114d703becabc3cc25504eaed521471b69e199e60cde092e597a275e075.png

If we are interested in the population per continent, we can pass different aggregation strategies to the dissolve() functionusing the aggfunc argument:

https://geopandas.org/en/stable/docs/user_guide/aggregation_with_dissolve.html#dissolve-arguments

continents = world.dissolve(by="continent", aggfunc=dict(pop_est="sum"))
continents.plot(column="pop_est");
_images/e83a1c8f054944471e53f3e878f005ac4a927b03268c8d6dc24224ba3e59c04b.png

You can also pass a pandas.Series to the dissolve() function that describes your mapping for more exotic aggregation strategies (e.g. by first letter of the country name):

world.dissolve(by=world.name.str[0], aggfunc=dict(pop_est="sum")).plot(column="pop_est")
<Axes: >
_images/49131979c198cc0abba043678f868f1dd36c88483a0b1a2689d5fd0f9e67f027.png

Representative Points and Crow-Fly Distances#

The following example includes code to retrieve representative points from polygons and to calculate the distance on a sphere between two points.

world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
/tmp/ipykernel_2355/3912264495.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
  world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
points = world.representative_point()
fig, ax = plt.subplots()
world.plot(ax=ax)
points.plot(ax=ax, color="red", markersize=3);
_images/2e81e773f7f7d92c501f6d972a619afb5540feb16dbc382ab744ef459deb69b7.png
points = points.to_crs(4087)
points.index = world.iso_a3
distances = pd.concat({k: points.distance(p) for k, p in points.items()}, axis=1).div(
    1e3
)  # km
distances.loc["DEU", "NLD"]
564.4945392494385
cutout.data.temperature.sel(time="2013-01-01 00:00").to_pandas().head()
x 3.25 3.50 3.75 4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75 6.00 6.25 6.50 6.75 7.00 7.25
y
50.50 281.467486 281.302013 281.104233 280.899361 280.669274 280.506953 280.434459 280.381666 280.365906 280.339115 279.755231 279.102793 278.573278 278.823064 279.118552 279.760746 279.755231
50.75 281.641627 281.454091 281.299649 281.216124 281.141267 281.030164 280.960034 280.907241 281.055379 281.098717 280.862326 280.747283 280.543987 281.069562 281.346927 281.491125 281.175150
51.00 281.961542 281.774794 281.608533 281.551799 281.516340 281.518704 281.458031 281.372142 281.328804 281.348503 281.516340 281.625868 281.694421 281.975726 282.004881 281.547859 280.866266
51.25 282.238907 282.078950 281.979666 281.826800 281.736971 281.641627 281.561254 281.487185 281.463546 281.450151 281.440695 281.604593 281.850439 281.854379 281.755095 281.204305 280.781954
51.50 282.494997 282.295641 282.182962 282.086829 281.995425 281.834680 281.755095 281.682602 281.736971 281.740911 281.682602 281.653447 281.748791 281.856743 281.926872 281.743275 281.432816

Global Equal-Area and Equal-Distance CRS#

Previously, we used EPSG:3035 as projection to calculate the area of regions in km². However, this projection is not correct for regions outside of Europe, so that we need to pick different, more suitable projections for calculating areas and distances between regions.

The unit of measurement for both projections is metres.

AREA_CRS = "ESRI:54009"
DISTANCE_CRS = "EPSG:4087"
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
/tmp/ipykernel_2355/3912264495.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
  world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world.to_crs(AREA_CRS).plot()
<Axes: >
_images/65675964576cd7e575516249a89c3742bca9cdff30c6e41a05d23be94f94459c.png
world.to_crs(DISTANCE_CRS).plot()
<Axes: >
_images/e81e1c4a5b8c22f7a8c1c3b73d5c68d014f261530dda1461ed4978a7806f8614.png

Country Converter#

The country converter (coco) is a Python package to convert and match country names between different classifications and between different naming versions.

import country_converter as coco

Convert various country names to some standard names, specifying source and target classification scheme:

coco.convert(names="NLD", to="name_official")
'Kingdom of the Netherlands'
coco.convert(names="NLD", to="iso2")
'NL'
coco.convert(names="NLD", src="iso3", to="iso2")
'NL'
country_list = ["AE", "AL", "AM", "AO", "AR"]
coco.convert(names=country_list, src="iso2", to="short_name")
['United Arab Emirates', 'Albania', 'Armenia', 'Angola', 'Argentina']

List of included country classification schemes:

cc = coco.CountryConverter()
cc.valid_country_classifications
['DACcode',
 'Eora',
 'FAOcode',
 'GBDcode',
 'GWcode',
 'IOC',
 'ISO2',
 'ISO3',
 'ISOnumeric',
 'UNcode',
 'ccTLD',
 'name_official',
 'name_short',
 'regex']

Gurobi#

Gurobi is one of the most powerful solvers to solve optimisation problems. It is a commercial solver, with free academic licenses.

Note

Using this solver for the group assignment is optional. You can also use other open-source alternatives, but they might just take a little longer to solve.

To set up Gurobi, you need to: