Introduction to geopandas
and cartopy
#
Basic Setup#
Again, we will be using pandas
and matplotlib
in this tutorial.
import pandas as pd
import matplotlib.pyplot as plt
Note
If you have not yet set up Python on your computer, you can execute this tutorial in your browser via Google Colab. Click on the rocket in the top right corner and launch “Colab”. If that doesn’t work download the .ipynb
file and import it in Google Colab.
Then install the following packages by executing the following command in a Jupyter cell at the top of the notebook.
!pip install pandas geopandas matplotlib cartopy mapclassify
Why do we need something other than pandas
?#
Let’s reload again our example dataset of conventional power plants in Europe as a pd.DataFrame
.
fn = "https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv"
ppl = pd.read_csv(fn, index_col=0)
This dataset includes coordinates (latitude and longitude), which allows us to plot the location and capacity of all power plants in a scatter plot:
ppl.plot.scatter("lon", "lat", s=ppl.Capacity / 1e3)
<Axes: xlabel='lon', ylabel='lat'>
However, this graphs misses some geographic reference point, we’d normally expect for a map like shorelines, country borders etc.
Geopandas - a Pandas extension for geospatial data#
Geopandas extends pandas
by adding support for geospatial data.
The core data structure in GeoPandas is the geopandas.GeoDataFrame
, a subclass of pandas.DataFrame
, that can store geometry columns and perform spatial operations.
Note
Documentation for this package is available at https://geopandas.org/en/stable/.
Typical geometries are points, lines, and polygons. They come from another library called shapely
, which helps you create, analyze, and manipulate two-dimensional shapes and their properties, such as points, lines, and polygons.
First, we need to import the geopandas
package. The conventional alias is gpd
:
import geopandas as gpd
We can convert the latitude and longitude values given in the dataset to formal geometries (to be exact: shapely.Point
objects but we won’t go into detail regarding this) using the gpd.points_from_xy()
function, and use this to gpd.GeoDataFrame
. We should also specify a so-called coordinate reference system (CRS). The code ‘4326’ means latitude and longitude values.
geometry = gpd.points_from_xy(ppl["lon"], ppl["lat"])
gdf = gpd.GeoDataFrame(ppl, geometry=geometry, crs=4326)
Now, the gdf
looks like this:
gdf.head(3)
Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | DateIn | DateRetrofit | DateOut | lat | lon | Duration | Volume_Mm3 | DamHeight_m | StorageCapacity_MWh | EIC | projectID | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
0 | Borssele | Hard Coal | Steam Turbine | PP | Netherlands | 485.0 | NaN | 1973.0 | NaN | 2034.0 | 51.4332 | 3.7160 | NaN | 0.0 | 0.0 | 0.0 | {'49W000000000054X'} | {'BEYONDCOAL': {'BEYOND-NL-2'}, 'ENTSOE': {'49... | POINT (3.716 51.4332) |
1 | Flamanville | Nuclear | Steam Turbine | PP | France | 2660.0 | NaN | 1985.0 | NaN | 2051.0 | 49.5366 | -1.8823 | NaN | 0.0 | 0.0 | 0.0 | {'17W100P100P0210Y', '17W100P100P0209J'} | {'ENTSOE': {'17W100P100P0210Y', '17W100P100P02... | POINT (-1.8823 49.5366) |
2 | Emsland | Nuclear | Steam Turbine | PP | Germany | 1336.0 | 0.33 | 1988.0 | 1988.0 | 2023.0 | 52.4716 | 7.3204 | NaN | 0.0 | 0.0 | 0.0 | {'11WD7KKE-1K--KW5'} | {'ENTSOE': {'11WD7KKE-1K--KW5'}, 'GEM': {'G100... | POINT (7.3204 52.4716) |
With the additional geometry
columns, it is now even easier to plot the geographic data:
gdf.plot(
column="Fueltype",
markersize=gdf.Capacity / 1e2,
)
<Axes: >
We can also start up an interactive map to explore the geodata in more detail:
gdf.explore(column="Fueltype")