Solution pandas#

Power Plants Data#

In this exercise, we will use the powerplants.csv dataset from the powerplantmatching project. This dataset contains information about various power plants, including their names, countries, fuel types, capacities, and more.

URL: https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv

Task 1: Load the dataset into a pandas DataFrame.

Hide code cell content

import pandas as pd

url = (
    "https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv"
)
df = pd.read_csv(url, index_col=0)

Task 2: Run the function .describe() on the DataFrame.

Hide code cell content

df.describe()
Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh
count 29565.000000 541.000000 23772.000000 2436.000000 932.000000 29565.000000 29565.000000 614.000000 29565.000000 29565.000000 29565.000000
mean 46.318929 0.480259 2005.777764 1988.509442 2020.609442 49.636627 9.010803 1290.093123 2.889983 6.463436 355.629403
std 198.176762 0.179551 41.415357 25.484325 10.372226 5.600155 8.090038 1541.197008 78.807802 46.451613 6634.715089
min 0.000000 0.140228 0.000000 1899.000000 1969.000000 32.647300 -27.069900 0.007907 0.000000 0.000000 0.000000
25% 2.800000 0.356400 2004.000000 1971.000000 2018.000000 46.861100 5.029700 90.338261 0.000000 0.000000 0.000000
50% 8.200000 0.385600 2012.000000 1997.000000 2021.000000 50.207635 9.255472 792.753846 0.000000 0.000000 0.000000
75% 25.000000 0.580939 2017.000000 2009.000000 2023.000000 52.630700 12.850190 2021.408119 0.000000 0.000000 0.000000
max 6000.000000 0.917460 2030.000000 2020.000000 2051.000000 71.012300 39.655350 16840.000000 9500.000000 1800.000000 421000.000000

Task 3: Provide a list of unique fuel types and technologies included in the dataset.

Note

Look in the pandas documentation for functions that might be useful to solve these tasks.

Hide code cell content

df.Fueltype.unique()
array(['Nuclear', 'Hard Coal', 'Hydro', 'Lignite', 'Natural Gas', 'Oil',
       'Solid Biomass', 'Wind', 'Solar', 'Other', 'Biogas', 'Waste',
       'Geothermal'], dtype=object)

Hide code cell content

df.Technology.unique()
array(['Steam Turbine', 'Reservoir', 'Pumped Storage', 'Run-Of-River',
       'CCGT', nan, 'Offshore', 'Onshore', 'Unknown', 'Pv', 'Marine',
       'Not Found', 'Combustion Engine', 'PV', 'CSP', 'unknown',
       'not found'], dtype=object)

Task 4: Filter the dataset by power plants with the fuel type “Hard Coal”.

Hide code cell content

coal = df.loc[df.Fueltype == "Hard Coal"]
coal
Name Fueltype Technology Set Country Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh EIC projectID
id
2 Borssele Hard Coal Steam Turbine PP Netherlands 485.000000 NaN 1973.0 NaN 2034.0 51.433200 3.716000 NaN 0.0 0.0 0.0 {'49W000000000054X'} {'BEYONDCOAL': {'BEYOND-NL-2'}, 'ENTSOE': {'49...
98 Didcot Hard Coal CCGT PP United Kingdom 1490.000000 0.550000 1970.0 1998.0 2013.0 51.622300 -1.260800 NaN 0.0 0.0 0.0 {'48WSTN0000DIDCBC'} {'BEYONDCOAL': {'BEYOND-UK-22'}, 'ENTSOE': {'4...
129 Mellach Hard Coal Steam Turbine CHP Austria 200.000000 NaN 1986.0 1986.0 2020.0 46.911700 15.488300 NaN 0.0 0.0 0.0 {'14W-WML-KW-----0'} {'BEYONDCOAL': {'BEYOND-AT-11'}, 'ENTSOE': {'1...
150 Emile Huchet Hard Coal CCGT PP France 596.493211 NaN 1958.0 2010.0 2022.0 49.152500 6.698100 NaN 0.0 0.0 0.0 {'17W100P100P0344D', '17W100P100P0345B'} {'BEYONDCOAL': {'BEYOND-FR-67'}, 'ENTSOE': {'1...
151 Amercoeur Hard Coal CCGT PP Belgium 451.000000 0.187765 1968.0 NaN 2009.0 50.431000 4.395500 NaN 0.0 0.0 0.0 {'22WAMERCO000010Y'} {'BEYONDCOAL': {'BEYOND-BE-27'}, 'ENTSOE': {'2...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
29779 St Hard Coal NaN CHP Germany 21.645000 NaN 1982.0 NaN NaN 49.976593 9.068953 NaN 0.0 0.0 0.0 {nan} {'MASTR': {'MASTR-SEE971943692655'}}
29804 Uer Hard Coal NaN CHP Germany 15.200000 NaN 1964.0 NaN NaN 51.368132 6.662350 NaN 0.0 0.0 0.0 {nan} {'MASTR': {'MASTR-SEE988421065542'}}
29813 Walheim Hard Coal NaN PP Germany 244.000000 NaN 1964.0 NaN NaN 49.017585 9.157690 NaN 0.0 0.0 0.0 {nan, nan} {'MASTR': {'MASTR-SEE937157344278', 'MASTR-SEE...
29830 Wd Ffw Hard Coal NaN CHP Germany 123.000000 NaN 1990.0 NaN NaN 50.099000 8.653000 NaN 0.0 0.0 0.0 {nan, nan} {'MASTR': {'MASTR-SEE915289541482', 'MASTR-SEE...
29835 West Hard Coal NaN CHP Germany 277.000000 NaN 1985.0 NaN NaN 52.442456 10.762681 NaN 0.0 0.0 0.0 {nan, nan} {'MASTR': {'MASTR-SEE917432813484', 'MASTR-SEE...

332 rows × 18 columns

Task 5: Identify the 5 largest coal power plants. In which countries are they located? When were they built?

Hide code cell content

selection = coal.Capacity.nlargest(5).index
selection
Index([194, 3652, 767, 3651, 3704], dtype='int64', name='id')

Hide code cell content

coal.loc[selection, ["Name", "Country", "Capacity", "DateIn"]]
Name Country Capacity DateIn
id
194 Kozienice Poland 3682.216205 1972.0
3652 Vuglegirska Ukraine 3600.000000 1972.0
767 Opole Poland 3071.893939 1993.0
3651 Zaporizhia Ukraine 2825.000000 1972.0
3704 Moldavskaya Gres Moldova 2520.000000 1964.0

Task 6: Identify the power plant with the longest name.

Hide code cell content

i = df.Name.apply(lambda x: len(x)).argmax()
df.iloc[i]
Name                   Kesznyeten Hernadviz Hungary Kesznyeten Hernad...
Fueltype                                                           Hydro
Technology                                                  Run-Of-River
Set                                                                   PP
Country                                                          Hungary
Capacity                                                             4.4
Efficiency                                                           NaN
DateIn                                                            1945.0
DateRetrofit                                                      1992.0
DateOut                                                              NaN
lat                                                            47.995972
lon                                                            21.033037
Duration                                                             NaN
Volume_Mm3                                                           0.0
DamHeight_m                                                         14.0
StorageCapacity_MWh                                                  0.0
EIC                                                                {nan}
projectID                   {'JRC': {'JRC-H2138'}, 'GEO': {'GEO-42677'}}
Name: 2595, dtype: object

Task 7: Identify the 10 northernmost powerplants. What type of power plants are they?

Hide code cell content

index = df.lat.nlargest(10).index
df.loc[index]
Name Fueltype Technology Set Country Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh EIC projectID
id
13729 Havoygavlen Wind Farm Wind Onshore PP Norway 78.0 NaN 2003.0 NaN 2021.0 71.012300 24.594200 NaN 0.0 0.0 0.0 {nan, nan} {'GEM': {'G100000917445', 'G100000917626'}}
14629 Kjollefjord Wind Farm Wind Onshore PP Norway 39.0 NaN 2006.0 NaN NaN 70.918500 27.289900 NaN 0.0 0.0 0.0 {nan} {'GEM': {'G100000917523'}}
4385 Repvag Hydro Reservoir Store Norway 4.4 NaN 1953.0 1953.0 NaN 70.773547 25.616486 2810.704545 28.3 172.0 12367.1 {nan} {'JRC': {'JRC-N339'}, 'OPSD': {'OEU-4174'}}
18578 Raggovidda Wind Farm Wind Onshore PP Norway 97.0 NaN 2014.0 NaN NaN 70.765700 29.083300 NaN 0.0 0.0 0.0 {nan, nan} {'GEM': {'G100000918729', 'G100000918029'}}
5363 Maroyfjord Hydro Reservoir Store Norway 4.4 NaN 1956.0 1956.0 NaN 70.751421 27.355047 1533.681818 13.8 225.3 6748.2 {nan} {'JRC': {'JRC-N289'}, 'OPSD': {'OEU-4036'}}
6506 Melkoya Natural Gas CCGT PP Norway 230.0 NaN 2007.0 2007.0 NaN 70.689366 23.600448 NaN 0.0 0.0 0.0 {'50WP00000000456T'} {'ENTSOE': {'50WP00000000456T'}, 'OPSD': {'OEU...
13634 Hammerfest Snohvit Terminal Natural Gas CCGT PP Norway 229.0 NaN NaN NaN NaN 70.685400 23.590000 NaN 0.0 0.0 0.0 {nan} {'GEM': {'L100000407847'}}
13636 Hamnefjell Wind Farm Wind Onshore PP Norway 52.0 NaN 2017.0 NaN NaN 70.667900 29.719000 NaN 0.0 0.0 0.0 {nan} {'GEM': {'G100000918725'}}
5425 Hammerfest Hydro Reservoir Store Norway 1.1 NaN 1947.0 1947.0 NaN 70.657936 23.714418 1638.181818 10.6 88.0 1802.0 {nan} {'JRC': {'JRC-N127'}, 'OPSD': {'OEU-3550'}}
5270 Kongsfjord Hydro Reservoir Store Norway 4.4 NaN 1946.0 1946.0 NaN 70.598729 29.057905 3343.795455 88.1 70.5 14712.7 {nan} {'JRC': {'JRC-N210'}, 'OPSD': {'OEU-3801'}}

Task 8: What is the average start year of each fuel type? Sort the fuel types by their average start year in ascending order and round to the nearest integer.

Hide code cell content

df.groupby("Fueltype").DateIn.mean().round().sort_values()
Fueltype
Hard Coal        1972.0
Hydro            1973.0
Nuclear          1976.0
Lignite          1977.0
Other            1992.0
Waste            1997.0
Geothermal       2000.0
Oil              2001.0
Solid Biomass    2001.0
Natural Gas      2002.0
Wind             2010.0
Biogas           2013.0
Solar            2015.0
Name: DateIn, dtype: float64

Wind and Solar Capacity Factors#

In this exercise, we will work with a time series dataset containing hourly wind and solar capacity factors for Ireland, taken from model.energy.

Task 1: Use pd.read_csv to load the dataset from the following URL into a pandas DataFrame. Ensure that the time stamps are treated as pd.DatetimeIndex.

Hide code cell content

url = "https://model.energy/data/time-series-2b42655fa0b49b73fb15871dba2f7000.csv"
df = pd.read_csv(url, index_col=0, parse_dates=True)

Task 2: Calculate the mean capacity factor for wind and solar over the entire time period.

Hide code cell content

df.mean()
onwind    0.366898
solar     0.099274
dtype: float64

Task 3: Calculate the correlation between wind and solar capacity factors.

Note

Go to the pandas documentation for functions that might be useful to solve these tasks.

Hide code cell content

df.corr()
onwind solar
onwind 1.000000 -0.096682
solar -0.096682 1.000000

Task 4: Plot the wind and solar capacity factors for the month of May.

Hide code cell content

df.loc["05-2011"].plot(ylabel="capacity factor")
<Axes: ylabel='capacity factor'>
_images/f646bbd5c44bff7b02edbb4ed73b4c8614aee3be969cfcc030cb013c6779e78f.png

Task 5: Plot the weekly average capacity factors for wind and solar over the entire time period.

Hide code cell content

df.resample("W").mean().plot()
<Axes: >
_images/2c1033218f3628fd38d216deffdf7fb645a78a34258659e119e5a36a6ba5b63d.png

Task 6: Go to model.energy and retrieve the time series for another region of your choice. Recreate the analysis above and compare the results.

Note

Look for “Download Comma-Separated-Variable (CSV) file of data” in Step 2.