Introduction to geopandas & cartopy#
Note
If you have not yet set up Python on your computer, you can execute this tutorial in your browser via Google Colab. Click on the rocket in the top right corner and launch “Colab”. If that doesn’t work download the .ipynb file and import it in Google Colab.
Then install the following packages by executing the following command in a Jupyter cell at the top of the notebook.
!pip install pandas geopandas matplotlib cartopy mapclassify
Why not pandas?#
Let’s reload again our example dataset of conventional power plants in Europe as a pd.DataFrame.
import pandas as pd
import matplotlib.pyplot as plt
fn = "https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv"
ppl = pd.read_csv(fn, index_col=0)
This dataset includes coordinates (latitude and longitude), which allows us to plot the location and capacity of all power plants in a scatter plot:
ppl.plot.scatter("lon", "lat", s=ppl.Capacity / 1e3)
<Axes: xlabel='lon', ylabel='lat'>
However, this graphs misses some geographic reference point, we’d normally expect for a map like shorelines, country borders and so on.
Why geopandas?#

Geopandas extends pandas by adding support for geospatial data.
The core data structure in GeoPandas is the geopandas.GeoDataFrame, a subclass of pandas.DataFrame, that can store geometry columns and perform spatial operations.
Note
Documentation for this package is available at https://geopandas.org/en/stable/.
Typical geometries are points, lines, and polygons. They come from another library called shapely, which helps you create, analyze, and manipulate two-dimensional shapes and their properties, such as points, lines, and polygons.
First, we need to import the geopandas package. The conventional alias is gpd:
import geopandas as gpd
We can convert the latitude and longitude values given in the dataset to formal geometries (to be exact, a shapely.Point object) using the gpd.points_from_xy() function, and use this to gpd.GeoDataFrame. We should also specify a so-called coordinate reference system (CRS). The code 4326 means latitude and longitude values, the so-called WGS84 system.
geometry = gpd.points_from_xy(ppl["lon"], ppl["lat"])
geometry[:5]
<GeometryArray>
[ <POINT (7.324 52.473)>, <POINT (9.345 53.851)>, <POINT (3.716 51.433)>,
<POINT (9.176 49.04)>, <POINT (12.293 48.606)>]
Length: 5, dtype: geometry
gdf = gpd.GeoDataFrame(ppl, geometry=geometry, crs=4326)
Now, the resulting gdf looks like this:
gdf.head(3)
| Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | DateIn | DateRetrofit | DateOut | lat | lon | Duration | Volume_Mm3 | DamHeight_m | StorageCapacity_MWh | EIC | projectID | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||
| 0 | Kernkraftwerk Emsland | Nuclear | Steam Turbine | PP | Germany | 1336.0 | 0.33 | 1988.0 | 1988.0 | 2023.0 | 52.472897 | 7.32414 | NaN | 0.0 | 0.0 | 0.0 | {nan} | {'MASTR': {'MASTR-SEE944567587799'}, 'ENTSOE':... | POINT (7.32414 52.4729) |
| 1 | Brokdorf | Nuclear | Steam Turbine | PP | Germany | 1410.0 | 0.33 | 1986.0 | 1986.0 | 2021.0 | 53.850830 | 9.34472 | NaN | 0.0 | 0.0 | 0.0 | {nan} | {'MASTR': {'MASTR-SEE951462745445'}, 'ENTSOE':... | POINT (9.34472 53.85083) |
| 2 | Borssele | Hard Coal | Steam Turbine | PP | Netherlands | 485.0 | NaN | 1973.0 | NaN | 2034.0 | 51.433200 | 3.71600 | NaN | 0.0 | 0.0 | 0.0 | {'49W000000000054X'} | {'BEYONDCOAL': {'BEYOND-NL-2'}, 'ENTSOE': {'49... | POINT (3.716 51.4332) |
gdf.geometry.head()
id
0 POINT (7.32414 52.4729)
1 POINT (9.34472 53.85083)
2 POINT (3.716 51.4332)
3 POINT (9.17641 49.04002)
4 POINT (12.29315 48.6056)
Name: geometry, dtype: geometry
With the additional geometry columns, it is now even easier to plot the geographic data:
gdf.plot(
column="Fueltype",
markersize=gdf.Capacity / 1e2,
)
<Axes: >
We can also start up an interactive map to explore the geodata in more detail:
gdf.explore(column="Fueltype")