Tutorial: CrossRegistry¶
The CrossRegistry allows you to conveniently get, aggregate, and label data stored
at the CROSS data platform
Packages and data¶
# to manage your .env file, you can use the python-dotenv package.
# Install it with pip if you haven't already:
from dotenv import load_dotenv
import os
# Import the CrossRegistry class from the crosscontract package
from crosscontract import CrossRegistry
Creating the CrossRegistry¶
To create the registry, you simply provide your username and password. Here we assume that your credentials are stored in a .env file and we extract them from there.
Note Do not store your credentials in GitHub!
# load the environment variables from the .env file
load_dotenv(".env")
username = os.getenv("CROSSUSER")
# create the registry using the environment variables
my_registry = CrossRegistry(
username=os.getenv("CROSSUSER"),
password=os.getenv("PASSWORD")
)
Getting a variable¶
To get a variable, you need to know the name of the contract. To get on overview
over your available contracts, you can use the contract_overview property.
my_registry.contract_overview.query("name.str.startswith('result_')")
| name | title | description | |
|---|---|---|---|
| 7 | result_electricity_consumption | Result submission - Electricity consumption | Electricity consumption as submitted from scen... |
| 14 | result_electricity_supply | Result submission - Electricity supply | Electricity supply as submitted from scenario ... |
| 15 | result_h2_fec | Result submission - Hydrogen final energy cons... | Hydrogen final energy consumption as submitted... |
| 19 | result_h2_supply | Result submission - Hydrogen supply | Hydrogen supply as submitted from scenario runs |
| 24 | result_methane_consumption | Result submission - Methane final energy consu... | Methane final energy consumption as submitted ... |
| 27 | result_methane_supply | Result submission - Methane supply | Methane supply as submitted from scenario runs |
| 28 | result_liquids_consumption | Result submission - Liquid fuels final energy ... | Liquid fuels final energy consumption as submi... |
| 31 | result_liquids_supply | Result submission - Liquid fuels supply | Liquid fuels supply as submitted from scenario... |
| 32 | result_process_heat_energy_production | Result submission - Process heat production | Useful energy production of process heat as su... |
| 35 | result_space_heat_energy_supply | Result submission - Space Heat supply | Useful energy supply of space heat as submitte... |
| 37 | result_district_heat_energy_production | Result submission - District Heat production | Useful energy production of distric heat as su... |
| 38 | result_passenger_road_private_fec | Result submission - Passenger road private tra... | Final energy consumption of passenger road pri... |
| 39 | result_passenger_road_public_fec | Result submission - Passenger road public tran... | Final energy consumption of passenger road pub... |
| 40 | result_freight_road_fec | Result submission - Freight road transport fin... | Final energy consumption of freight transport ... |
| 41 | result_storage_installed_volume | Result submission - Installed storage volume | Installed storage size |
| 43 | result_storage_output | Result submission - Storage output | Installed storage size |
| 44 | result_elec_cons_typical_day | Result sumission - Electricity consumption | Electricity consumption as submitted from scen... |
| 46 | result_elec_supply_typical_day | Result submission - Electricity supply | Electricity supply as submitted from scenario ... |
Given the name, you can add the variable to the registry or simply use dot notation. If you use dot notation, the registry will automatically add the variable to the registry.
res_elec_supply = my_registry.result_electricity_supply
res_elec_supply
CrossDataVariable(name=result_electricity_supply, filters=None)
As you can see, we can provide a filter for the data. This is available if you use the add() method and will filter the data already coming from the platform.
To add the variable again, we need to use overwrite=True as it is already in the
registry. Here we filter to only get data for the year 2050. Note that this will
affect all later usage, as the filter is general, i.e., applied when the registry
fetches the data from the CROSS platform.
Note Currently server side filtering is rather restricted
- Only one value per field is allowd
- Only string columns can be filtered
res_elec_supply = my_registry.add_variable(
"result_electricity_supply",
filters={"scenario_name": "abroad-res-full"},
overwrite=True
)
Assessing data¶
Now that you have the variable, you can access the data by using its data attribute.
Using the data attribute provides you the data stored at the platform as pandas
dataframe (with the filter already applied).
res_elec_supply.data.head()
| model | scenario_group | scenario_name | scenario_variant | technology | country | year | unit | value | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | powercheck | cross202506 | abroad-res-full | reference | methane_chp_woccs | CH | 2040 | TWh | 0.0 |
| 1 | powercheck | cross202506 | abroad-res-full | reference | methane_chp_woccs | CH | 2050 | TWh | 0.0 |
| 2 | powercheck | cross202506 | abroad-res-full | reference | methane_chp_ccs | CH | 2040 | TWh | 0.0 |
| 3 | powercheck | cross202506 | abroad-res-full | reference | methane_chp_ccs | CH | 2050 | TWh | 0.0 |
| 4 | powercheck | cross202506 | abroad-res-full | reference | methane_oc_woccs | CH | 2040 | TWh | 0.0 |
While the .data property provides access to the full dataset, the get_data
method allows you to specify additional filters, aggregate the data, and to label
items based on the information in the contract (and the references to the Cross Dimensions).
- Filtering is based on a dictionary with the key being the name of the column and the value a list with the allowed values
- Aggregation is also dictionary based. The key is the name of the column over which to aggregate and the entry is an integer to specify the aggregation level. 0 is the highest aggregation level, i.e., the level with as little as possible details.
- Labeling is based on the
use_titlesparameter. If set to true all columns will be automatically relabelled. - Columns allow to narrow the list of columns in the dataframe provided. Note that the filter does not drop colums at all. Columns are always applied at the very end of the transformation.
res_elec_supply.get_data(
filters={"year": [2050], "scenario_variant": ["reference"]},
aggregation={"technology": 0},
use_titles=True,
columns=["model", "technology", "value"]
).pivot_table(index="model", columns="technology", values="value").round(1)
| technology | Electricity storage | Electrochemical | Imports of electricity | Renewables | Thermal power plants |
|---|---|---|---|---|---|
| model | |||||
| EHUB | 22.6 | 4.7 | 15.0 | 79.9 | 1.2 |
| PowerCheck | 6.0 | 0.2 | 10.7 | 64.9 | 3.8 |
| SES | 0.0 | NaN | 18.0 | 220.2 | 5.2 |
| SES-ETH | 7.0 | 0.0 | 16.4 | 78.5 | 5.0 |
| STEM | 6.1 | 0.2 | 13.6 | 79.9 | 6.2 |
| SecMOD | 15.9 | 0.0 | 47.1 | 80.0 | 4.7 |
Aggregation¶
Aggregation is more flexible than only using one aggregation level. In principle there are three ways to aggregate:
- Provide a single level of aggregation for the aggregation level (as above):
aggregation={"technology": 0} - Aggregate to given set of identifiers: E.g.
aggregation={"technology": ["renewable", "thermal"]} - Aggregate everything to a given level except some identifiers that should be kept:
{"technology": {"level": 0, "keep": ["hydro_dam", "hydro_run"]}}
Note that the list of identifiers has to include the original identifiers and not the
label or title of the column items as they appears after use_titles=True.
For the aggregation by title assume the example with aggregation={"technology": ["renewable", "thermal"]}. This
aggregates all sub-categories of renewable and thermal but leaves the remaining items
untouched:
res_elec_supply.get_data(
filters={"year": [2050], "scenario_variant": ["reference"]},
aggregation={"technology": ["renewable", "thermal"]},
use_titles=True,
columns=["model", "technology", "value"]
).pivot_table(index="model", columns="technology", values="value").round(1)
| technology | Discharge of batteries | Discharge of pumped hydro storage | Electricity storage | Fuel cell using hydrogen | Fuel cell using methane | Imports of electricity | Renewables | Thermal power plants |
|---|---|---|---|---|---|---|---|---|
| model | ||||||||
| EHUB | 18.0 | 4.6 | NaN | 4.7 | 0.0 | 15.0 | 79.9 | 1.2 |
| PowerCheck | 3.8 | 2.2 | NaN | 0.2 | 0.0 | 10.7 | 64.9 | 3.8 |
| SES | 0.0 | 0.0 | 0.0 | NaN | NaN | 18.0 | 220.2 | 5.2 |
| SES-ETH | 1.6 | 5.4 | NaN | 0.0 | 0.0 | 16.4 | 78.5 | 5.0 |
| STEM | 2.3 | 3.8 | NaN | 0.2 | 0.0 | 13.6 | 79.9 | 6.2 |
| SecMOD | 8.4 | 7.5 | NaN | 0.0 | 0.0 | 47.1 | 80.0 | 4.7 |
Now suppose you want to aggregate everything to level 0 but want to have hydro technologies more disaggregated: {"technology": {"level": 0, "keep": ["hydro_dam", "hydro_run"]}}
res_elec_supply.get_data(
filters={"year": [2050], "scenario_variant": ["reference"]},
aggregation={"technology": {"level": 0, "keep": ["hydro_dam", "hydro_run"]}},
use_titles=True,
columns=["model", "technology", "value"]
).pivot_table(index="model", columns="technology", values="value").round(1)
| technology | Electricity storage | Electrochemical | Hydro Dams | Imports of electricity | Renewables | Thermal power plants |
|---|---|---|---|---|---|---|
| model | ||||||
| EHUB | 22.6 | 4.7 | 18.4 | 15.0 | 61.5 | 1.2 |
| PowerCheck | 6.0 | 0.2 | 18.1 | 10.7 | 46.8 | 3.8 |
| SES | 0.0 | NaN | 20.0 | 18.0 | 200.2 | 5.2 |
| SES-ETH | 7.0 | 0.0 | 19.5 | 16.4 | 59.0 | 5.0 |
| STEM | 6.1 | 0.2 | 20.8 | 13.6 | 59.1 | 6.2 |
| SecMOD | 15.9 | 0.0 | 16.4 | 47.1 | 63.6 | 4.7 |
Examine dimensions¶
To use the flexible aggregation, must know the identifiers and the hierarchy within the dimensions. One way is to look it up at the CROSS webpage.
Alternatively, you can inspect the dimension associated with a column from the given variable:
(
res_elec_supply
.dimensions["technology"]
.data
[["id", "level", "id_parent"]]
.pivot(index="id", values="id_parent", columns="level")
.sort_index()
.fillna("")
)
| level | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| id | ||||
| battery_out | storage_elec | |||
| coal_cc | coal_pp | |||
| coal_cc_ccs | coal_cc | |||
| coal_cc_woccs | coal_cc | |||
| coal_chp | coal_pp | |||
| ... | ... | ... | ... | ... |
| wood_cc_woccs | wood_cc | |||
| wood_chp | wood_pp | |||
| wood_chp_ccs | wood_chp | |||
| wood_chp_woccs | wood_chp | |||
| wood_pp | thermal |
69 rows × 4 columns