Preparation for Group Assignment#

This tutorial contains various small guides for tasks you will need or come in handy in the upcoming group assignment.

We’re going to need a couple of packages for this tutorial:

from atlite.gis import ExclusionContainer
from atlite.gis import shape_availability
import atlite
from rasterio.plot import show
import geopandas as gpd
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import country_converter as coco
import atlite

Preparatory Downloads#

For this tutorial, we also need to download a few files, for which one can use the urllib library:

Hide code cell content
from urllib.request import urlretrieve
Hide code cell content
fn = "era5-2013-NL.nc"
url = "https://tubcloud.tu-berlin.de/s/bAJj9xmN5ZLZQZJ/download/" + fn
urlretrieve(url, fn)
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[3], line 3
      1 fn = "era5-2013-NL.nc"
      2 url = "https://tubcloud.tu-berlin.de/s/bAJj9xmN5ZLZQZJ/download/" + fn
----> 3 urlretrieve(url, fn)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:270, in urlretrieve(url, filename, reporthook, data)
    267     reporthook(blocknum, bs, size)
    269 while True:
--> 270     block = fp.read(bs)
    271     if not block:
    272         break

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/http/client.py:473, in HTTPResponse.read(self, amt)
    470 if self.length is not None and amt > self.length:
    471     # clip the read to the "end of response"
    472     amt = self.length
--> 473 s = self.fp.read(amt)
    474 if not s and amt:
    475     # Ideally, we would raise IncompleteRead if the content-length
    476     # wasn't satisfied, but it might break compatibility.
    477     self._close_conn()

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/socket.py:718, in SocketIO.readinto(self, b)
    716 while True:
    717     try:
--> 718         return self._sock.recv_into(b)
    719     except timeout:
    720         self._timeout_occurred = True

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/ssl.py:1314, in SSLSocket.recv_into(self, buffer, nbytes, flags)
   1310     if flags != 0:
   1311         raise ValueError(
   1312           "non-zero flags not allowed in calls to recv_into() on %s" %
   1313           self.__class__)
-> 1314     return self.read(nbytes, buffer)
   1315 else:
   1316     return super().recv_into(buffer, nbytes, flags)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/ssl.py:1166, in SSLSocket.read(self, len, buffer)
   1164 try:
   1165     if buffer is not None:
-> 1166         return self._sslobj.read(len, buffer)
   1167     else:
   1168         return self._sslobj.read(len)

KeyboardInterrupt: 
Hide code cell content
fn = "PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326-NL.tif"
url = f"https://tubcloud.tu-berlin.de/s/567ckizz2Y6RLQq/download?path=%2Fcopernicus-glc&files={fn}"
urlretrieve(url, fn)
Hide code cell content
fn = "WDPA_Oct2022_Public_shp-NLD.tif"
url = (
    f"https://tubcloud.tu-berlin.de/s/567ckizz2Y6RLQq/download?path=%2Fwdpa&files={fn}"
)
urlretrieve(url, fn)
Hide code cell content
fn = "GEBCO_2014_2D-NL.nc"
url = (
    f"https://tubcloud.tu-berlin.de/s/567ckizz2Y6RLQq/download?path=%2Fgebco&files={fn}"
)
urlretrieve(url, fn)

Downloading historical weather data from ERA5 with atlite#

First, let’s load some small example country. Let’s say, the Netherlands.

fn = "https://tubcloud.tu-berlin.de/s/567ckizz2Y6RLQq/download?path=%2Fgadm&files=gadm_410-levels-ADM_1-NLD.gpkg"
regions = gpd.read_file(fn)
/home/fneum/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/pyogrio/raw.py:198: RuntimeWarning: File /vsimem/pyogrio_484939ad5cdb4f1a822a3ed96fcd1859 has GPKG application_id, but non conformant file extension
  return ogr_read(
regions.plot()
<Axes: >
_images/5db12575bfd4024c7adbb9915a2ed654ce3f7898f9dd606a6595135329178efb.png

In this example we download historical weather data ERA5 data on-demand for a cutout we want to create.

Note

For this example to work, you should have

  • installed the Copernicus Climate Data Store cdsapi package (conda list cdsapi or pip install cdsapi) and

  • registered and setup your CDS API key as described on this website. Note that there are different instructions depending on your operating system.

A cutout is the basis for any of your work and calculations in atlite.

The cutout is created in the directory and file specified by the relative path. If a cutout at the given location already exists, then this command will simply load the cutout again. If the cutout does not yet exist, it will specify the new cutout to be created.

For creating the cutout, you need to specify the dataset (e.g. ERA5), a time period and the spatial extent (in latitude and longitude).

minx, miny, maxx, maxy = regions.total_bounds
buffer = 0.25
cutout = atlite.Cutout(
    path="era5-2013-NL.nc",
    module="era5",
    x=slice(minx - buffer, maxx + buffer),
    y=slice(miny - buffer, maxy + buffer),
    time="2013",
)
/home/fneum/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/atlite/cutout.py:191: UserWarning: Arguments module, x, y, time are ignored, since cutout is already built.
  warn(

Calling the function cutout.prepare() initiates the download and processing of the weather data. Because the download needs to be provided by the CDS servers, this might take a while depending on the amount of data requested.

Note

You can check the status of your request here.

# cutout.prepare(compression=None)
/home/fneum/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/cads_api_client/legacy_api_client.py:101: UserWarning: This is a beta version. The following parameters have not been implemented yet: {'info_callback': <bound method Logger.debug of <Logger atlite.datasets.era5 (WARNING)>>}.
  warnings.warn(
2024-10-15 20:58:23,417 WARNING [2024-10-15T00:00:00] On 16th of October between 9h and 12h (UTC) service availability may suffer disruptions due to an upgrade system session. You can check status [here](https://status.ecmwf.int/).
2024-10-15 20:58:23,418 INFO [2024-09-28T00:00:00] **Welcome to the New Climate Data Store (CDS)!** This new system is in its early days of full operations and still undergoing enhancements and fine tuning. Some disruptions are to be expected. Your 
[feedback](https://jira.ecmwf.int/plugins/servlet/desk/portal/1/create/202) is key to improve the user experience on the new CDS for the benefit of everyone. Thank you.
2024-10-15 20:58:23,418 WARNING [2024-09-26T00:00:00] Should you have not yet migrated from the old CDS system to the new CDS, please check our [informative page](https://confluence.ecmwf.int/x/uINmFw) for guidance.
2024-10-15 20:58:23,418 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2024-10-15 20:58:23,419 INFO [2024-09-16T00:00:00] Remember that you need to have an ECMWF account to use the new CDS. **Your old CDS credentials will not work in new CDS!**
2024-10-15 20:58:23,419 WARNING [2024-06-16T00:00:00] CDS API syntax is changed and some keys or parameter names may have also changed. To avoid requests failing, please use the "Show API request code" tool on the dataset Download Form to check you are using the correct syntax for your API request.
2024-10-15 20:58:24,163 WARNING [2024-10-10T00:00:00] The final validated ERA5 differs from ERA5T in July 2024 - please refer to our
[Forum announcement](https://forum.ecmwf.int/t/final-validated-era5-product-to-differ-from-era5t-in-july-2024/6685)
for details and watch it for further updates on this.
2024-10-15 20:58:24,163 INFO Request ID is ba65ddef-58d7-4f4b-a3ad-6fcf455ea605
2024-10-15 20:58:24,261 INFO status has been updated to accepted
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[9], line 1
----> 1 cutout.prepare(compression=None)

File ~/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/atlite/data.py:116, in maybe_remove_tmpdir.<locals>.wrapper(*args, **kwargs)
    114 kwargs["tmpdir"] = mkdtemp()
    115 try:
--> 116     res = func(*args, **kwargs)
    117 finally:
    118     rmtree(kwargs["tmpdir"])

File ~/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/atlite/data.py:208, in cutout_prepare(cutout, features, tmpdir, overwrite, compression, show_progress, dask_kwargs, monthly_requests, concurrent_requests)
    206 logger.info(f"Calculating and writing with module {module}:")
    207 missing_features = missing_vars.index.unique("feature")
--> 208 ds = get_features(
    209     cutout,
    210     module,
    211     missing_features,
    212     tmpdir=tmpdir,
    213     monthly_requests=monthly_requests,
    214     concurrent_requests=concurrent_requests,
    215 )
    216 prepared |= set(missing_features)
    218 cutout.data.attrs.update(dict(prepared_features=list(prepared)))

File ~/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/atlite/data.py:59, in get_features(cutout, module, features, tmpdir, monthly_requests, concurrent_requests)
     48     feature_data = delayed(get_data)(
     49         cutout,
     50         feature,
   (...)
     55         **parameters,
     56     )
     57     datasets.append(feature_data)
---> 59 datasets = compute(*datasets)
     61 ds = xr.merge(datasets, compat="equals")
     62 for v in ds:

File ~/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/dask/base.py:660, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    657     postcomputes.append(x.__dask_postcompute__())
    659 with shorten_traceback():
--> 660     results = schedule(dsk, keys, **kwargs)
    662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~/miniconda3/envs/esm-ws-24-25/lib/python3.12/queue.py:171, in Queue.get(self, block, timeout)
    169 elif timeout is None:
    170     while not self._qsize():
--> 171         self.not_empty.wait()
    172 elif timeout < 0:
    173     raise ValueError("'timeout' must be a non-negative number")

File ~/miniconda3/envs/esm-ws-24-25/lib/python3.12/threading.py:355, in Condition.wait(self, timeout)
    353 try:    # restore state no matter what (e.g., KeyboardInterrupt)
    354     if timeout is None:
--> 355         waiter.acquire()
    356         gotit = True
    357     else:

KeyboardInterrupt: 
2024-10-15 21:02:42,518 INFO status has been updated to successful

The data is accessible in cutout.data. Included weather variables are listed in cutout.prepared_features. Querying the cutout gives us some basic information on which data is contained in it.

cutout.data
<xarray.Dataset> Size: 134MB
Dimensions:           (x: 17, y: 14, time: 8760)
Coordinates:
  * x                 (x) float64 136B 3.25 3.5 3.75 4.0 ... 6.5 6.75 7.0 7.25
  * y                 (y) float64 112B 50.5 50.75 51.0 ... 53.25 53.5 53.75
  * time              (time) datetime64[ns] 70kB 2013-01-01 ... 2013-12-31T23...
    lon               (x) float64 136B dask.array<chunksize=(17,), meta=np.ndarray>
    lat               (y) float64 112B dask.array<chunksize=(14,), meta=np.ndarray>
Data variables: (12/13)
    height            (y, x) float32 952B dask.array<chunksize=(14, 17), meta=np.ndarray>
    wnd100m           (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    wnd_azimuth       (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    roughness         (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    influx_toa        (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    influx_direct     (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    ...                ...
    albedo            (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    solar_altitude    (time, y, x) float64 17MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    solar_azimuth     (time, y, x) float64 17MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    temperature       (time, y, x) float64 17MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    soil temperature  (time, y, x) float64 17MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
    runoff            (time, y, x) float32 8MB dask.array<chunksize=(100, 14, 17), meta=np.ndarray>
Attributes:
    module:             era5
    prepared_features:  ['influx', 'wind', 'height', 'temperature', 'runoff']
    chunksize_time:     100
    Conventions:        CF-1.6
    history:            2023-01-15 21:33:09 GMT by grib_to_netcdf-2.25.1: /op...
cutout.prepared_features
module  feature    
era5    height                   height
        wind                    wnd100m
        wind                wnd_azimuth
        wind                  roughness
        influx               influx_toa
        influx            influx_direct
        influx           influx_diffuse
        influx                   albedo
        influx           solar_altitude
        influx            solar_azimuth
        temperature         temperature
        temperature    soil temperature
        runoff                   runoff
dtype: object
cutout
<Cutout "era5-2013-NL">
 x = 3.25 ⟷ 7.25, dx = 0.25
 y = 50.50 ⟷ 53.75, dy = 0.25
 time = 2013-01-01 ⟷ 2013-12-31, dt = h
 module = era5
 prepared_features = ['height', 'wind', 'influx', 'temperature', 'runoff']

Repetition: From Land Eligibility Analysis to Availability Matrix#

We’re going to use the plotting functions from previous exercises:

def plot_area(masked, transform, shape):
    fig, ax = plt.subplots(figsize=(5, 5))
    ax = show(masked, transform=transform, cmap="Greens", vmin=0, ax=ax)
    shape.plot(ax=ax, edgecolor="k", color="None", linewidth=1)

First, we collect all exclusion and inclusion criteria in an ExclusionContainer.

excluder = ExclusionContainer(crs=3035, res=300)
fn = "PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326-NL.tif"
excluder.add_raster(fn, codes=[50], buffer=1000, crs=4326)
excluder.add_raster(fn, codes=[20, 30, 40, 60], crs=4326, invert=True)
fn = "WDPA_Oct2022_Public_shp-NLD.tif"
excluder.add_raster(fn, crs=3035)
fn = "GEBCO_2014_2D-NL.nc"
excluder.add_raster(fn, codes=lambda x: x > 10, crs=4326, invert=True)

Then, we can calculate the available areas…

masked, transform = shape_availability(regions.to_crs(3035).geometry, excluder)

… and plot it:

plot_area(masked, transform, regions.to_crs(3035).geometry)
_images/e85b1c799ad3bd8c7fcf1c65820856fe96635a270706274f40918d542133ec62.png

Availability Matrix#

regions.index = regions.GID_1
cutout = atlite.Cutout("era5-2013-NL.nc")
?cutout.availabilitymatrix
Signature:
cutout.availabilitymatrix(
    shapes,
    excluder,
    nprocesses=None,
    disable_progressbar=True,
)
Docstring:
Compute the eligible share within cutout cells in the overlap with shapes.

For parallel calculation (nprocesses not None) the excluder must not be
initialized and all raster references must be strings. Otherwise processes
are colliding when reading from one common rasterio.DatasetReader.

Parameters
----------
cutout : atlite.Cutout
    Cutout which the availability matrix is aligned to.
shapes : geopandas.Series/geopandas.DataFrame
    Geometries for which the availabilities are calculated.
excluder : atlite.gis.ExclusionContainer
    Container of all meta data or objects which to exclude, i.e.
    rasters and geometries.
nprocesses : int, optional
    Number of processes to use for calculating the matrix. The paralle-
    lization can heavily boost the calculation speed. The default is None.
disable_progressbar: bool, optional
    Disable the progressbar if nprocesses is not None. Then the `map`
    function instead of the `imap` function is used for the multiprocessing
    pool. This speeds up the calculation.

Returns
-------
availabilities : xr.DataArray
    DataArray of shape (|shapes|, |y|, |x|) containing all the eligible
    share of cutout cell (x,y) in the overlap with shape i.

Notes
-----
The rasterio (or GDAL) average downsampling returns different results
dependent on how the target raster (the cutout raster) is spanned.
Either it is spanned from the top left going downwards,
e.g. Affine(0.25, 0, 0, 0, -0.25, 50), or starting in the
lower left corner and going up, e.g. Affine(0.25, 0, 0, 0, 0.25, 50).
Here we stick to the top down version which is why we use
`cutout.transform_r` and flipping the y-axis in the end.
File:      ~/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/atlite/gis.py
Type:      method
A = cutout.availabilitymatrix(regions, excluder)
cap_per_sqkm = 1.7  # MW/km2

area = cutout.grid.set_index(["y", "x"]).to_crs(3035).area / 1e6

area = xr.DataArray(area, dims=("spatial"))

capacity_matrix = A.stack(spatial=["y", "x"]) * area * cap_per_sqkm

Solar PV Profiles#

pv = cutout.pv(
    panel=atlite.solarpanels.CdTe,
    matrix=capacity_matrix,
    orientation="latitude_optimal",
    index=regions.index,
    per_unit=True,
)
pv.to_pandas().head()
GID_1 NLD.1_1 NLD.2_1 NLD.3_1 NLD.4_1 NLD.5_1 NLD.6_1 NLD.7_1 NLD.8_1 NLD.9_1 NLD.10_1 NLD.11_1 NLD.12_1 NLD.13_1 NLD.14_1
time
2013-01-01 00:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2013-01-01 01:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2013-01-01 02:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2013-01-01 03:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2013-01-01 04:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pv.to_pandas().iloc[:, 0].plot()
<Axes: xlabel='time'>
_images/34be66343fe1b509829244cb289d70e5571b892a12a352e904711b054617786b.png

Wind Profiles#

wind = cutout.wind(
    atlite.windturbines.Vestas_V112_3MW,
    matrix=capacity_matrix,
    index=regions.index,
    per_unit=True,
)
/home/fneum/miniconda3/envs/esm-ws-24-25/lib/python3.12/site-packages/atlite/resource.py:72: FutureWarning: 'add_cutout_windspeed' for wind turbine
power curves will default to True in atlite relase v0.2.15.
  warnings.warn(msg, FutureWarning)
wind.to_pandas().head()
GID_1 NLD.1_1 NLD.2_1 NLD.3_1 NLD.4_1 NLD.5_1 NLD.6_1 NLD.7_1 NLD.8_1 NLD.9_1 NLD.10_1 NLD.11_1 NLD.12_1 NLD.13_1 NLD.14_1
time
2013-01-01 00:00:00 0.999056 0.998485 0.998294 0.997185 0.998198 0.998937 0.997705 0.998457 0.996653 0.998057 0.996010 0.998684 0.997938 0.995548
2013-01-01 01:00:00 0.991667 0.993051 0.970360 0.995386 0.953421 0.994112 0.997151 0.997635 0.984541 0.997531 0.985574 0.996537 0.992089 0.983092
2013-01-01 02:00:00 0.981956 0.970941 0.933896 0.982949 0.903412 0.974296 0.995252 0.993278 0.939996 0.989856 0.956938 0.968682 0.976204 0.952877
2013-01-01 03:00:00 0.948241 0.870216 0.846933 0.946690 0.840527 0.877124 0.980648 0.951256 0.773720 0.957361 0.852354 0.853057 0.899308 0.837002
2013-01-01 04:00:00 0.770508 0.647249 0.572062 0.846147 0.593890 0.611827 0.921734 0.850735 0.490069 0.841523 0.681832 0.705750 0.730653 0.623015
wind.to_pandas().iloc[:, 0].plot()
<Axes: xlabel='time'>
_images/1118d2b061573be63b8bed91956fe21d4fe7b6df35e9cf9b8437b1df9462e27f.png

Merging Shapes in geopandas#

Spatial data is often more granular than we need. For example, we might have data on sub-national units, but we’re actually interested in studying patterns at the level of countries.

Whereas in pandas we would use the groupby() function to aggregate entries, in geopandas, we aggregate geometric features using the dissolve() function.

Suppose we are interested in studying continents, but we only have country-level data like the country dataset included in geopandas. We can easily convert this to a continent-level dataset.

world = gpd.read_file("https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip")
world.head(3)
featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE TLC ADMIN ... FCLASS_TR FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA geometry
0 Admin-0 country 1 6 Fiji FJI 0 2 Sovereign country 1 Fiji ... None None None None None None None None None MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ...
1 Admin-0 country 1 3 United Republic of Tanzania TZA 0 2 Sovereign country 1 United Republic of Tanzania ... None None None None None None None None None POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3...
2 Admin-0 country 1 7 Western Sahara SAH 0 2 Indeterminate 1 Western Sahara ... Unrecognized Unrecognized Unrecognized None None Unrecognized None None None POLYGON ((-8.66559 27.65643, -8.66512 27.58948...

3 rows × 169 columns

world
featurecla                                      Admin-0 country
scalerank                                                     1
LABELRANK                                                     6
SOVEREIGNT                                                 Fiji
SOV_A3                                                      FJI
                                    ...                        
FCLASS_NL                                                  None
FCLASS_SE                                                  None
FCLASS_BD                                                  None
FCLASS_UA                                                  None
geometry      MULTIPOLYGON (((180 -16.067132663642447, 180 -...
Name: 0, Length: 169, dtype: object
continents = world.dissolve(by="CONTINENT").geometry

continents.plot();
_images/2f8e8a68d3374cb0eda8af28bb29fd300ce8d706aa28afe376752f6ee26506c3.png

If we are interested in the population per continent, we can pass different aggregation strategies to the dissolve() functionusing the aggfunc argument:

https://geopandas.org/en/stable/docs/user_guide/aggregation_with_dissolve.html#dissolve-arguments

continents = world.dissolve(by="CONTINENT", aggfunc=dict(POP_EST="sum"))
continents.plot(column="POP_EST");
_images/df63ff5e2f2f332572fb172c59573cbc9b976464741ab7b8dddfadf3211b8802.png

You can also pass a pandas.Series to the dissolve() function that describes your mapping for more exotic aggregation strategies (e.g. by first letter of the country name):

world.dissolve(by=world.NAME.str[0], aggfunc=dict(POP_EST="sum")).plot(column="POP_EST")
<Axes: >
_images/7514cda339882721bc96a1969c335b4c89ae07126615e722383d70bbf31aed34.png

Representative Points and Crow-Fly Distances#

The following example includes code to retrieve representative points from polygons and to calculate the distance on a sphere between two points.

world = gpd.read_file("https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip")
points = world.representative_point()
fig, ax = plt.subplots()
world.plot(ax=ax)
points.plot(ax=ax, color="red", markersize=3);
_images/9ff749b02a6ac9999e42d4d95e6f08a48f75a6eb260452b007991e47dc00cdea.png
points = points.to_crs(4087)
points.index = world.ISO_A3
distances = pd.concat({k: points.distance(p) for k, p in points.items()}, axis=1).div(
    1e3
)  # km
distances.loc["DEU", "NLD"]
564.4945392494385

Global Equal-Area and Equal-Distance CRS#

Previously, we used EPSG:3035 as projection to calculate the area of regions in km². However, this projection is not correct for regions outside of Europe, so that we need to pick different, more suitable projections for calculating areas and distances between regions.

The unit of measurement for both projections is metres.

AREA_CRS = "ESRI:54009"
DISTANCE_CRS = "EPSG:4087"
world = gpd.read_file("https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip")
world.to_crs(AREA_CRS).plot()
<Axes: >
_images/c0884c24fd5833698774b035a5a84c9985e1ece2776b1d9d2cc49c60c49092f4.png
world.to_crs(DISTANCE_CRS).plot()
<Axes: >
_images/833dab86a2080d523809c85e70c4eb2d990a50590bd50d1c582a72d8c99986cf.png

Country Converter#

The country converter (coco) is a Python package to convert and match country names between different classifications and between different naming versions.

import country_converter as coco

Convert various country names to some standard names, specifying source and target classification scheme:

coco.convert(names="NLD", to="name_official")
'Kingdom of the Netherlands'
coco.convert(names="NLD", to="iso2")
'NL'
coco.convert(names="NLD", src="iso3", to="iso2")
'NL'
country_list = ["AE", "AL", "AM", "AO", "AR"]
coco.convert(names=country_list, src="iso2", to="short_name")
['United Arab Emirates', 'Albania', 'Armenia', 'Angola', 'Argentina']

List of included country classification schemes:

cc = coco.CountryConverter()
cc.valid_country_classifications
['DACcode',
 'Eora',
 'FAOcode',
 'GBDcode',
 'GWcode',
 'IOC',
 'ISO2',
 'ISO3',
 'ISOnumeric',
 'UNcode',
 'ccTLD',
 'name_official',
 'name_short',
 'regex']

Gurobi#

Gurobi is one of the most powerful solvers to solve optimisation problems. It is a commercial solver, with free academic licenses.

Note

Using this solver for the group assignment is optional. You can also use other open-source alternatives, but they might just take a little longer to solve.

To set up Gurobi, you need to: