Introduction to pandas#

Note

This material is mostly adapted from the following resources:

Pandas is a an open source library providing Excel-like tables in Python. It offers functionality for efficiently reading, writing, and processing data such as sorting, filtering, aggregating, and visualizing. Moreover, it provides tools for handling missing data and time series data.

Note

Documentation for this package is available at https://pandas.pydata.org/docs/.

Note

If you have not yet set up Python on your computer, you can execute this tutorial in your browser via Google Colab. Click on the rocket in the top right corner and launch “Colab”. If that doesn’t work download the .ipynb file and import it in Google Colab.

Then install pandas and numpy by executing the following command in a Jupyter cell at the top of the notebook.

!pip install pandas numpy
import pandas as pd
import numpy as np

The Series#

A Series represents a one-dimensional array of data. The main difference between a Series and numpy array is that a Series has an index. The index contains the labels that we use to access the data. It is actually quite similar to a Python dictionary, where each value is associated with a key.

There are many ways to create a Series, but the core constructor is pd.Series() which can process a dictionary to create a Series.

Note

The data used below is from Wikipedia’s List of power stations in Germany.

dictionary = {
    "Neckarwestheim": 1269,
    "Isar 2": 1365,
    "Emsland": 1290,
}
s = pd.Series(dictionary)
s
Neckarwestheim    1269
Isar 2            1365
Emsland           1290
dtype: int64

Arithmetic operations and most numpy functions can be applied to pd.Series. An important point is that the Series keep their index during such operations.

np.log(s) / s**0.5
Neckarwestheim    0.200600
Isar 2            0.195391
Emsland           0.199418
dtype: float64

We can access the underlying index object if we need to:

s.index
Index(['Neckarwestheim', 'Isar 2', 'Emsland'], dtype='object')

We can get values back out using the index via the .loc attribute

s.loc["Isar 2"]
np.int64(1365)

Or by raw position using .iloc

s.iloc[2]
np.int64(1290)

We can pass a list or array to loc to get multiple rows back:

s.loc[["Neckarwestheim", "Emsland"]]
Neckarwestheim    1269
Emsland           1290
dtype: int64

And we can even use so-called slicing notation (:) to get ranges of rows:

s.loc["Neckarwestheim":"Emsland"]
Neckarwestheim    1269
Isar 2            1365
Emsland           1290
dtype: int64
s.iloc[:2]
Neckarwestheim    1269
Isar 2            1365
dtype: int64

If we need to, we can always get the raw data back out as well

s.values  # a numpy array
array([1269, 1365, 1290])

The DataFrame#

There is a lot more to a pandas.Series, but they are limit to a single column. A more broadly useful Pandas data structure is the DataFrame. pandas.DataFrame is a collection of series that share the same index. It’s a lot like a table in a spreadsheet.

The core constructor is pd.DataFrame(), which can be used like this using a dictionary of lists:

data = {
    "capacity": [1269, 1365, 1290],  # MW
    "type": ["PWR", "PWR", "PWR"],
    "start_year": [1989, 1988, 1988],
    "end_year": [np.nan, np.nan, np.nan],
}
df = pd.DataFrame(data, index=["Neckarwestheim", "Isar 2", "Emsland"])
df
capacity type start_year end_year
Neckarwestheim 1269 PWR 1989 NaN
Isar 2 1365 PWR 1988 NaN
Emsland 1290 PWR 1988 NaN

We can also switch columns and rows very easily using the .T (transpose) attribute:

df.T
Neckarwestheim Isar 2 Emsland
capacity 1269 1365 1290
type PWR PWR PWR
start_year 1989 1988 1988
end_year NaN NaN NaN

A wide range of statistical functions are available on both Series and DataFrames.

df.min()
capacity      1269
type           PWR
start_year    1988
end_year       NaN
dtype: object
df.mean(numeric_only=True)
capacity      1308.000000
start_year    1988.333333
end_year              NaN
dtype: float64
df.describe()
capacity start_year end_year
count 3.000000 3.000000 0.0
mean 1308.000000 1988.333333 NaN
std 50.467812 0.577350 NaN
min 1269.000000 1988.000000 NaN
25% 1279.500000 1988.000000 NaN
50% 1290.000000 1988.000000 NaN
75% 1327.500000 1988.500000 NaN
max 1365.000000 1989.000000 NaN

We can get a single column as a Series using python’s getitem syntax on the DataFrame object.

df["capacity"]
Neckarwestheim    1269
Isar 2            1365
Emsland           1290
Name: capacity, dtype: int64

…or using attribute syntax.

df.capacity
Neckarwestheim    1269
Isar 2            1365
Emsland           1290
Name: capacity, dtype: int64

Indexing works very similar to series

df.loc["Emsland"]
capacity      1290
type           PWR
start_year    1988
end_year       NaN
Name: Emsland, dtype: object
df.iloc[2]
capacity      1290
type           PWR
start_year    1988
end_year       NaN
Name: Emsland, dtype: object

But we can also specify the column(s) and row(s) we want to access

df.loc["Emsland", "start_year"]
np.int64(1988)
df.loc[["Emsland", "Neckarwestheim"], ["start_year", "end_year"]]
start_year end_year
Emsland 1988 NaN
Neckarwestheim 1989 NaN

Mathematical operations work as well, either on the whole DataFrame or on specific columns, the result of which can be assigned to a new column:

df.capacity * 0.8
Neckarwestheim    1015.2
Isar 2            1092.0
Emsland           1032.0
Name: capacity, dtype: float64
df["reduced_capacity"] = df.capacity * 0.8
df
capacity type start_year end_year reduced_capacity
Neckarwestheim 1269 PWR 1989 NaN 1015.2
Isar 2 1365 PWR 1988 NaN 1092.0
Emsland 1290 PWR 1988 NaN 1032.0

Cleaning Data#

We can also remove columns or rows from a DataFrame:

df.drop("reduced_capacity", axis="columns")
capacity type start_year end_year
Neckarwestheim 1269 PWR 1989 NaN
Isar 2 1365 PWR 1988 NaN
Emsland 1290 PWR 1988 NaN

We can update the variable df by either overwriting df or passing an inplace keyword:

df.drop("reduced_capacity", axis="columns", inplace=True)

We can also drop columns with only NaN values

df.dropna(axis=1)
capacity type start_year
Neckarwestheim 1269 PWR 1989
Isar 2 1365 PWR 1988
Emsland 1290 PWR 1988

Or fill it up with default fallback data:

df.fillna(2023)
capacity type start_year end_year
Neckarwestheim 1269 PWR 1989 2023.0
Isar 2 1365 PWR 1988 2023.0
Emsland 1290 PWR 1988 2023.0

Say, we already have one value for end_year and want to fill up the missing data. We can use forward fill (ffill) or backward fill (bfill):

df.loc["Emsland", "end_year"] = 2023
df.bfill()
capacity type start_year end_year
Neckarwestheim 1269 PWR 1989 2023.0
Isar 2 1365 PWR 1988 2023.0
Emsland 1290 PWR 1988 2023.0

Sometimes it can be useful to rename columns:

df.rename(columns=dict(x="lat", y="lon"))
capacity type start_year end_year
Neckarwestheim 1269 PWR 1989 NaN
Isar 2 1365 PWR 1988 NaN
Emsland 1290 PWR 1988 2023.0

Sometimes it can be useful to replace values:

df.replace({"PWR": "Pressurized water reactor"})
capacity type start_year end_year
Neckarwestheim 1269 Pressurized water reactor 1989 NaN
Isar 2 1365 Pressurized water reactor 1988 NaN
Emsland 1290 Pressurized water reactor 1988 2023.0

In many cases, we want to modify values in a dataframe based on some rule. To modify values, we need to use .loc or .iloc. It can be use to set a specific value or a set of values based on their index and column labels:

df.loc["Isar 2", "start_year"] = 1999
df.loc["Emsland", "capacity"] += 10
df
capacity type start_year end_year
Neckarwestheim 1269 PWR 1989 NaN
Isar 2 1365 PWR 1999 NaN
Emsland 1300 PWR 1988 2023.0

It can even be a completely new column:

operational = ["Neckarwestheim", "Isar 2", "Emsland"]
df.loc[operational, "y"] = [49.04, 48.61, 52.47]
df
capacity type start_year end_year y
Neckarwestheim 1269 PWR 1989 NaN 49.04
Isar 2 1365 PWR 1999 NaN 48.61
Emsland 1300 PWR 1988 2023.0 52.47

Combining Datasets#

Pandas supports a wide range of methods for merging different datasets. These are described extensively in the documentation. Here we just give a few examples.

data = {
    "capacity": [1288, 1360, 1326],  # MW
    "type": ["BWR", "PWR", "PWR"],
    "start_year": [1985, 1985, 1986],
    "end_year": [2021, 2021, 2021],
    "x": [10.40, 9.41, 9.35],
    "y": [48.51, 52.03, 53.85],
}
df2 = pd.DataFrame(data, index=["Gundremmingen", "Grohnde", "Brokdorf"])
df2
capacity type start_year end_year x y
Gundremmingen 1288 BWR 1985 2021 10.40 48.51
Grohnde 1360 PWR 1985 2021 9.41 52.03
Brokdorf 1326 PWR 1986 2021 9.35 53.85

We can now add this additional data to the df object

df = pd.concat([df, df2])
df
capacity type start_year end_year y x
Neckarwestheim 1269 PWR 1989 NaN 49.04 NaN
Isar 2 1365 PWR 1999 NaN 48.61 NaN
Emsland 1300 PWR 1988 2023.0 52.47 NaN
Gundremmingen 1288 BWR 1985 2021.0 48.51 10.40
Grohnde 1360 PWR 1985 2021.0 52.03 9.41
Brokdorf 1326 PWR 1986 2021.0 53.85 9.35

Sorting & Filtering Data#

We can also sort the entries in dataframes, e.g. alphabetically by index or numerically by column values

df.sort_index()
capacity type start_year end_year y x
Brokdorf 1326 PWR 1986 2021.0 53.85 9.35
Emsland 1300 PWR 1988 2023.0 52.47 NaN
Grohnde 1360 PWR 1985 2021.0 52.03 9.41
Gundremmingen 1288 BWR 1985 2021.0 48.51 10.40
Isar 2 1365 PWR 1999 NaN 48.61 NaN
Neckarwestheim 1269 PWR 1989 NaN 49.04 NaN
df.sort_values(by="capacity", ascending=False)
capacity type start_year end_year y x
Isar 2 1365 PWR 1999 NaN 48.61 NaN
Grohnde 1360 PWR 1985 2021.0 52.03 9.41
Brokdorf 1326 PWR 1986 2021.0 53.85 9.35
Emsland 1300 PWR 1988 2023.0 52.47 NaN
Gundremmingen 1288 BWR 1985 2021.0 48.51 10.40
Neckarwestheim 1269 PWR 1989 NaN 49.04 NaN

We can also filter a DataFrame using a boolean series obtained from a condition. This is very useful to build subsets of the DataFrame.

df.capacity > 1300
Neckarwestheim    False
Isar 2             True
Emsland           False
Gundremmingen     False
Grohnde            True
Brokdorf           True
Name: capacity, dtype: bool
df[df.capacity > 1300]
capacity type start_year end_year y x
Isar 2 1365 PWR 1999 NaN 48.61 NaN
Grohnde 1360 PWR 1985 2021.0 52.03 9.41
Brokdorf 1326 PWR 1986 2021.0 53.85 9.35

We can also combine multiple conditions, but we need to wrap the conditions with brackets!

df[(df.capacity > 1300) & (df.start_year >= 1988)]
capacity type start_year end_year y x
Isar 2 1365 PWR 1999 NaN 48.61 NaN

Or we make SQL-like queries:

df.query("start_year == 1988")
capacity type start_year end_year y x
Emsland 1300 PWR 1988 2023.0 52.47 NaN
threshold = 1300
df.query("start_year == 1988 and capacity > @threshold")
capacity type start_year end_year y x
df
capacity type start_year end_year y x
Neckarwestheim 1269 PWR 1989 NaN 49.04 NaN
Isar 2 1365 PWR 1999 NaN 48.61 NaN
Emsland 1300 PWR 1988 2023.0 52.47 NaN
Gundremmingen 1288 BWR 1985 2021.0 48.51 10.40
Grohnde 1360 PWR 1985 2021.0 52.03 9.41
Brokdorf 1326 PWR 1986 2021.0 53.85 9.35

Applying Functions#

Sometimes it can be useful to apply a function to all values of a column/row. For instance, we might be interested in normalised capacities relative to the largest nuclear power plant:

def normalise(s):
    return s / df.capacity.max()

df.capacity.apply(normalise)
Neckarwestheim    0.929670
Isar 2            1.000000
Emsland           0.952381
Gundremmingen     0.943590
Grohnde           0.996337
Brokdorf          0.971429
Name: capacity, dtype: float64

For simple functions, there’s often an easier alternative:

df.capacity / df.capacity.max()
Neckarwestheim    0.929670
Isar 2            1.000000
Emsland           0.952381
Gundremmingen     0.943590
Grohnde           0.996337
Brokdorf          0.971429
Name: capacity, dtype: float64

But the .apply() function often gives you more flexibility.

Plotting#

DataFrames have all kinds of useful plotting built in.

Note

Note, that we do not even have to import matplotlib for this. It is used under the hood by pandas to create the plots.

df.plot(kind="scatter", x="start_year", y="capacity")
<Axes: xlabel='start_year', ylabel='capacity'>
_images/5f365a84db5c2856d52cba82f3f4b88bd218cecf0b9dd16a5098c6648ad775e4.png
df.capacity.plot.barh(color="orange")
<Axes: >
_images/5eae4f3837c39a11c563c6658ad3452e3d89624d9fb2743b715c0530bf0352c1.png

Time Indexes#

Indexes are very powerful. They are a big part of why Pandas is so useful. There are different indices for different types of data. Time Indexes are especially great when handling time-dependent data.

time = pd.date_range(start="2021-01-01", end="2023-01-01", freq="D")
values = np.sin(2 * np.pi * time.dayofyear / 365)
ts = pd.Series(values, index=time)
ts.index
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09', '2021-01-10',
               ...
               '2022-12-23', '2022-12-24', '2022-12-25', '2022-12-26',
               '2022-12-27', '2022-12-28', '2022-12-29', '2022-12-30',
               '2022-12-31', '2023-01-01'],
              dtype='datetime64[ns]', length=731, freq='D')
ts.plot()
<Axes: >
_images/757b7c1d9bdda5f4538a84b3c1b0abb1879db9769dba0c943aa6ea2e09b29652.png

We can use Python’s slicing notation inside .loc to select a date range.

ts.loc["2021-01-01":"2021-07-01"].plot()
<Axes: >
_images/d0c065bb71daeb52f4722f72590fd3c93fb6da2fa43cfa956ef065babdf78980.png
ts.loc["2021-05"].plot()
<Axes: >
_images/d213ffb12e355f3b2f98c026565da72e8a496fa0b07ac68096354309562a5b46.png

The pd.TimeIndex object has lots of useful attributes

ts.index.month
Index([ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
       ...
       12, 12, 12, 12, 12, 12, 12, 12, 12,  1],
      dtype='int32', length=731)
ts.index.day
Index([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       ...
       23, 24, 25, 26, 27, 28, 29, 30, 31,  1],
      dtype='int32', length=731)

Another common operation is to change the resolution of a dataset by resampling in time. Pandas exposes this through the .resample() function. The resample periods are specified using pandas offset index syntax.

Below, we resample the dataset by taking the mean over each month.

ts.resample("ME").mean().head()
2021-01-31    0.268746
2021-02-28    0.698782
2021-03-31    0.949778
2021-04-30    0.959332
2021-05-31    0.709200
Freq: ME, dtype: float64
ts.resample("ME").mean().plot()
<Axes: >
_images/68c007727f7c0d1f96932881e1b67cab7b408a17d01ad05cc2c92cdcc001c819.png

Reading and Writing Files#

To read data into pandas, we can use for instance the pd.read_csv() function. This function is quite powerful and complex with many different settings. You can use it to extract data from almost any text file.

The pd.read_csv() function can take a path to a local file as an input, or even a hyperlink to an online text file.

Let’s import a slightly larger dataset about the power plant fleet in Europe_

fn = "https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv"
df = pd.read_csv(fn, index_col=0)
df.iloc[:5, :10]
Name Fueltype Technology Set Country Capacity Efficiency DateIn DateRetrofit DateOut
id
0 Kernkraftwerk Emsland Nuclear Steam Turbine PP Germany 1336.0 0.33 1988.0 1988.0 2023.0
1 Brokdorf Nuclear Steam Turbine PP Germany 1410.0 0.33 1986.0 1986.0 2021.0
2 Borssele Hard Coal Steam Turbine PP Netherlands 485.0 NaN 1973.0 NaN 2034.0
3 Gemeinschaftskernkraftwerk Neckarwestheim Nuclear Steam Turbine PP Germany 1310.0 0.33 1976.0 1989.0 2023.0
4 Isar Nuclear Steam Turbine PP Germany 1410.0 0.33 1979.0 1988.0 2023.0
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 29565 entries, 0 to 29862
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Name                 29565 non-null  object 
 1   Fueltype             29565 non-null  object 
 2   Technology           21661 non-null  object 
 3   Set                  29565 non-null  object 
 4   Country              29565 non-null  object 
 5   Capacity             29565 non-null  float64
 6   Efficiency           541 non-null    float64
 7   DateIn               23772 non-null  float64
 8   DateRetrofit         2436 non-null   float64
 9   DateOut              932 non-null    float64
 10  lat                  29565 non-null  float64
 11  lon                  29565 non-null  float64
 12  Duration             614 non-null    float64
 13  Volume_Mm3           29565 non-null  float64
 14  DamHeight_m          29565 non-null  float64
 15  StorageCapacity_MWh  29565 non-null  float64
 16  EIC                  29565 non-null  object 
 17  projectID            29565 non-null  object 
dtypes: float64(11), object(7)
memory usage: 4.3+ MB
df.describe()
Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh
count 29565.000000 541.000000 23772.000000 2436.000000 932.000000 29565.000000 29565.000000 614.000000 29565.000000 29565.000000 29565.000000
mean 46.318929 0.480259 2005.777764 1988.509442 2020.609442 49.636627 9.010803 1290.093123 2.889983 6.463436 355.629403
std 198.176762 0.179551 41.415357 25.484325 10.372226 5.600155 8.090038 1541.197008 78.807802 46.451613 6634.715089
min 0.000000 0.140228 0.000000 1899.000000 1969.000000 32.647300 -27.069900 0.007907 0.000000 0.000000 0.000000
25% 2.800000 0.356400 2004.000000 1971.000000 2018.000000 46.861100 5.029700 90.338261 0.000000 0.000000 0.000000
50% 8.200000 0.385600 2012.000000 1997.000000 2021.000000 50.207635 9.255472 792.753846 0.000000 0.000000 0.000000
75% 25.000000 0.580939 2017.000000 2009.000000 2023.000000 52.630700 12.850190 2021.408119 0.000000 0.000000 0.000000
max 6000.000000 0.917460 2030.000000 2020.000000 2051.000000 71.012300 39.655350 16840.000000 9500.000000 1800.000000 421000.000000

Sometimes, we also want to store a DataFrame for later use. There are many different file formats tabular data can be stored in, including HTML, JSON, Excel, Parquet, Feather, etc. Here, let’s say we want to store the DataFrame as CSV (comma-separated values) file under the name “tmp.csv”.

df.to_csv("tmp.csv")

Grouping and Aggregation#

Both Series and DataFrame objects have a groupby method, which allows you to group and aggregate the data based on the values of one or more columns.

It accepts a variety of arguments, but the simplest way to think about it is that you pass another series, whose unique values are used to split the original object into different groups.

Here’s an example which retrieves the total generation capacity per country.

grouped = df.groupby("Country").Capacity.sum()
grouped.head()
Country
Albania                    2370.400000
Austria                   24643.200368
Belgium                   21443.151009
Bosnia and Herzegovina     4827.195964
Bulgaria                  15699.186363
Name: Capacity, dtype: float64

Such chaining of multiple operations is very common with pandas.

Let’s break apart this operation a bit. The workflow with groupby can be divided into three general steps:

  1. Split: Partition the data into different groups based on some criterion.

  2. Apply: Do some calculation (e.g. aggregation or transformation) within each group.

  3. Combine: Put the results back together into a single object.

Grouping is not only possible on a single columns, but also on multiple columns. For instance, we might want to group the capacities by country and fuel type. To achieve this, we pass a list of functions to the groupby functions.

capacities = df.groupby(["Country", "Fueltype"]).Capacity.sum()
capacities
Country         Fueltype     
Albania         Hydro             1743.9
                Other               98.0
                Solar              294.5
                Wind               234.0
Austria         Hard Coal         1331.4
                                  ...   
United Kingdom  Other               55.0
                Solar            11668.6
                Solid Biomass     4919.0
                Waste              288.9
                Wind             38670.3
Name: Capacity, Length: 242, dtype: float64

By grouping by multiple attributes, our index becomes a pd.MultiIndex (a hierarchical index with multiple levels.

capacities.index[:5]
MultiIndex([('Albania',     'Hydro'),
            ('Albania',     'Other'),
            ('Albania',     'Solar'),
            ('Albania',      'Wind'),
            ('Austria', 'Hard Coal')],
           names=['Country', 'Fueltype'])
type(capacities.index)
pandas.core.indexes.multi.MultiIndex

We can use the .unstack function to reshape the multi-indexed pd.Series into a pd.DataFrame which has the second index level as columns.

capacities.unstack().tail().T
Country Spain Sweden Switzerland Ukraine United Kingdom
Fueltype
Biogas NaN NaN NaN NaN 31.000000
Geothermal NaN NaN NaN NaN NaN
Hard Coal 11904.878478 291.000000 NaN 24474.0 33823.617061
Hydro 26069.861248 14273.686625 20115.0408 6590.0 4576.175000
Lignite 1831.400000 NaN NaN NaN NaN
Natural Gas 28394.244000 2358.000000 55.0000 4687.9 36366.400000
Nuclear 7733.200000 9859.000000 3355.0000 17635.0 19181.000000
Oil 1854.371000 1685.000000 NaN NaN 100.000000
Other NaN NaN NaN NaN 55.000000
Solar 36998.200000 281.800000 96.8000 5628.7 11668.600000
Solid Biomass 563.000000 2432.600000 NaN NaN 4919.000000
Waste 388.054000 NaN NaN NaN 288.900000
Wind 34371.650000 16958.800000 55.0000 461.4 38670.300000

In summary, the typical workflow with pandas consists of reading data from files, inspecting and cleaning the data, performing analysis through transformation and aggregation, visualizing the results, and storing the processed data for later use.

Exercises#

Power Plants Data#

In this exercise, we will use the powerplants.csv dataset from the powerplantmatching project. This dataset contains information about various power plants, including their names, countries, fuel types, capacities, and more.

URL: https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv

Task 1: Load the dataset into a pandas DataFrame.

Task 2: Run the function .describe() on the DataFrame.

Task 3: Provide a list of unique fuel types and technologies included in the dataset.

Note

Look in the pandas documentation for functions that might be useful to solve these tasks.

Task 4: Filter the dataset by power plants with the fuel type “Hard Coal”.

Task 5: Identify the 5 largest coal power plants. In which countries are they located? When were they built?

Task 6: Identify the power plant with the longest name.

Task 7: Identify the 10 northernmost powerplants. What type of power plants are they?

Task 8: What is the average start year of each fuel type? Sort the fuel types by their average start year in ascending order and round to the nearest integer.

Wind and Solar Capacity Factors#

In this exercise, we will work with a time series dataset containing hourly wind and solar capacity factors for Ireland, taken from model.energy.

Task 1: Use pd.read_csv to load the dataset from the following URL into a pandas DataFrame. Ensure that the time stamps are treated as pd.DatetimeIndex.

Task 2: Calculate the mean capacity factor for wind and solar over the entire time period.

Task 3: Calculate the correlation between wind and solar capacity factors.

Note

Go to the pandas documentation for functions that might be useful to solve these tasks.

Task 4: Plot the wind and solar capacity factors for the month of May.

Task 5: Plot the weekly average capacity factors for wind and solar over the entire time period.

Task 6: Go to model.energy and retrieve the time series for another region of your choice. Recreate the analysis above and compare the results.

Note

Look for “Download Comma-Separated-Variable (CSV) file of data” in Step 2.