Tutorial: CrossClient¶
In this notebook you learn, how to connect to the CrossPlatform using the CrossClient an to get a contract and validate your data against the contract.
Packages and data¶
from crosscontract import CrossClient, CrossContract
import pandas as pd
Determine user¶
Here we assume that you have some .env file that stores your credentials and we extract them from there.
Note Do not store your credentials in GitHub!
from dotenv import load_dotenv
import os
load_dotenv(".env")
username = os.getenv("CROSSUSER")
password = os.getenv("PASSWORD")
Connect to the CrossPlatform¶
To connect to the platform using CrossClient you need a registered user. To create the client, simply provide it the username and password.
my_client = CrossClient(username=username, password=password)
my_client.contracts.overview()
| name | title | description | tags | status | |
|---|---|---|---|---|---|
| 0 | scenass_cost_generation_technologies | Investment cost of generation technologies | Investment cost data for electricity generatio... | [] | Active |
| 1 | scenass_cost_heating_technologies | Investment cost of heating technologies | Investment cost data for heating technologies ... | [] | Active |
| 2 | entsoe_tyndp_ntc | Net Transfer Capacities (NTC) from ENTSO-E TYNDP | Net transfer capacities (NTC) based on the ENT... | [] | Active |
| 3 | scenass_aviation_fuel_demand | Aviation Fuel Demand | Aviation fuel demand data used as assumptions ... | [] | Active |
| 4 | scenass_biomass_potential | Biomass Potential | Biomass potentials by biomass type used as ass... | [] | Active |
| 5 | scenass_cost_storage_technologies | Investment cost of storage technologies | Investment cost data for storage technologies\... | [] | Active |
| 6 | scenass_electric_appliances_useful_energy_demand | Useful energy demand of electric appliances | Electricity demand from electric appliances an... | [] | Active |
| 7 | scenass_households | Number of Households | Household data used as assumptions for scenari... | [] | Active |
| 8 | scenass_energy_reference_area | Energy Reference Area | Energy reference area by sector, year, buildin... | [] | Active |
| 9 | scenass_freight_transport_useful_energy_demand | Useful energy demand in freight transport | Useful energy demand in freight transport by t... | [] | Active |
| 10 | scenass_gdp | Gross Domestic Product (GDP) | GDP data used as assumptions for scenario runs. | [] | Active |
| 11 | scenass_hdd | Heating Degree Days (HDD) | Heating Degree Days (HDD) by climate scenario ... | [] | Active |
| 12 | scenass_import_prices | Fuel import price | Import prices by fuel type, year, and country ... | [] | Active |
| 13 | scenass_passenger_transport_useful_energy_demand | Useful energy demand in passenger transport | Useful energy demand in passenger transport by... | [] | Active |
| 14 | scenass_population | Population | Population projections used as assumptions for... | [] | Active |
| 15 | scenass_process_heat_useful_energy_demand | Useful energy demand for process heat | Useful energy demand for process heat by end-u... | [] | Active |
| 16 | scenass_space_heating_useful_energy_demand | Useful energy demand for space heating | Useful energy demand for space heating by end-... | [] | Active |
| 17 | scenass_warm_water_useful_energy_demand | Useful energy demand for warm water | Useful energy demand for warm water by end-use... | [] | Active |
| 18 | scenass_working_population | Working Population | Working population projections used as assumpt... | [] | Active |
| 19 | result_district_heat_energy_production | Result submission - District Heat production | Useful energy production of distric heat as su... | [] | Active |
| 20 | result_electricity_consumption | Result submission - Electricity consumption | Electricity consumption as submitted from scen... | [] | Active |
| 21 | result_elec_cons_typical_day | Result sumission - Electricity consumption | Electricity consumption as submitted from scen... | [] | Active |
| 22 | result_electricity_supply | Result submission - Electricity supply | Electricity supply as submitted from scenario ... | [] | Active |
| 23 | result_elec_supply_typical_day | Result submission - Electricity supply | Electricity supply as submitted from scenario ... | [] | Active |
| 24 | result_freight_road_fec | Result submission - Freight road transport fin... | Final energy consumption of freight transport ... | [] | Active |
| 25 | result_h2_fec | Result submission - Hydrogen final energy cons... | Hydrogen final energy consumption as submitted... | [] | Active |
| 26 | result_h2_supply | Result submission - Hydrogen supply | Hydrogen supply as submitted from scenario runs | [] | Active |
| 27 | result_liquids_consumption | Result submission - Liquid fuels final energy ... | Liquid fuels final energy consumption as submi... | [] | Active |
| 28 | result_liquids_supply | Result submission - Liquid fuels supply | Liquid fuels supply as submitted from scenario... | [] | Active |
| 29 | result_methane_consumption | Result submission - Methane final energy consu... | Methane final energy consumption as submitted ... | [] | Active |
| 30 | result_methane_supply | Result submission - Methane supply | Methane supply as submitted from scenario runs | [] | Active |
| 31 | result_passenger_road_private_fec | Result submission - Passenger road private tra... | Final energy consumption of passenger road pri... | [] | Active |
| 32 | result_passenger_road_public_fec | Result submission - Passenger road public tran... | Final energy consumption of passenger road pub... | [] | Active |
| 33 | result_process_heat_energy_production | Result submission - Process heat production | Useful energy production of process heat as su... | [] | Active |
| 34 | result_space_heat_energy_supply | Result submission - Space Heat supply | Useful energy supply of space heat as submitte... | [] | Active |
| 35 | result_storage_installed_volume | Result submission - Installed storage volume | Installed storage size | [] | Active |
| 36 | result_storage_output | Result submission - Storage output | Installed storage size | [] | Active |
| 37 | dim_building | Building Type | Different building types, e.g., single or mult... | [] | Active |
| 38 | dim_endusesector | End-use sector | Energy use sector in the sense of final energy... | [] | Active |
| 39 | dim_fuel | Fuel | Secondary energy carriers, e.g. hydrogen, elec... | [] | Active |
| 40 | dim_iso_region | Region | Region codes according to ISO 3166-1 alpha-2. ... | [] | Active |
| 41 | dim_model | Model | List of available models | [] | Active |
| 42 | dim_resource | Resource | Energy resource in the sense of primary energy... | [] | Active |
| 43 | dim_scenario | Scenario | List of available scenarios | [] | Active |
| 44 | dim_tech_generation | Generation technology | List of electricity generation technologies | [] | Active |
| 45 | dim_tech_heat | Heating technology | List of technologies to produce heat supply | [] | Active |
| 46 | dim_tech_hydrogen | Hydrogen technology | List of technologies used to produce hydrogen | [] | Active |
| 47 | dim_tech_liquids | Liquid fuels technology | List of technologies used to produce liquid fuels | [] | Active |
| 48 | dim_tech_methane | Methane technology | List of technologies used to produce methane | [] | Active |
| 49 | dim_tech_storage | Storage technology | List of technologies used to store energy | [] | Active |
| 50 | dim_trn_mode_freight | Freight transport mode | List of transport modes used for freight trans... | [] | Active |
| 51 | dim_trn_mode_private | Private transport mode | List of transport modes used for private, i.e.... | [] | Active |
| 52 | dim_use_elec | Electricity end-uses | List of end-uses of electricity | [] | Active |
| 53 | dim_use_hydrogen | Hydrogen end-uses | List of end-uses of hydrogen | [] | Active |
| 54 | dim_use_liquids | Hydrogen end-uses | List of end-uses of liquid fuels | [] | Active |
| 55 | dim_use_methane | Methane end-uses | List of end-uses of methane | [] | Active |
That's it. The platform knows who you are and how you want to login. So let's get a contract and use it for data validation.
Getting an overview¶
First we want to get an overview which contracts are on the CrossPlatform and
what they contain. For this we have the method client.contracts.overview that
provides a Pandas Dataframe with the metadata of the contract (as well as the status
of the contract).
As we noted above, we use the context manager to take advantage of automatic connection handling.
df_overview = my_client.contracts.overview()
df_overview[["name", "description"]].head(10)
| name | description | |
|---|---|---|
| 0 | scenass_cost_generation_technologies | Investment cost data for electricity generatio... |
| 1 | scenass_cost_heating_technologies | Investment cost data for heating technologies ... |
| 2 | entsoe_tyndp_ntc | Net transfer capacities (NTC) based on the ENT... |
| 3 | scenass_aviation_fuel_demand | Aviation fuel demand data used as assumptions ... |
| 4 | scenass_biomass_potential | Biomass potentials by biomass type used as ass... |
| 5 | scenass_cost_storage_technologies | Investment cost data for storage technologies\... |
| 6 | scenass_electric_appliances_useful_energy_demand | Electricity demand from electric appliances an... |
| 7 | scenass_households | Household data used as assumptions for scenari... |
| 8 | scenass_energy_reference_area | Energy reference area by sector, year, buildin... |
| 9 | scenass_freight_transport_useful_energy_demand | Useful energy demand in freight transport by t... |
Contract creation¶
Suppose we want to add our test contract given as:
test_contract = {
"name": "test_contract",
"title": "Test Contract",
"description": "A simple test contract",
"tableschema": {
"primaryKey": ["year", "country"],
"fields": [
{
"name": "value",
"type": "number",
"constraints": {
"required": True,
"minimum": 0.0,
"maximum": 100.0,
"unique": True,
},
},
{
"name": "year",
"type": "integer",
"constraints": {"required": True, "minimum": 2000, "maximum": 2025},
},
{
"name": "country",
"type": "string",
"constraints": {"required": False, "maxLength": 6, "minLength": 2},
},
],
}
}
To add the contract to the platform, we create the CrossContract and use the client.contracts.create
method. This will create a contract in Draft model. In this mode we are not allowed to
submit data. Therefore we directly activate the contract to put into Active state
which allows us data submission.
If you runt that line, you mostly likely will get a ConflictError as the contract
already exists. Alternatively you get a PermissionDeniedError as you are not allowed to
create contracts. We can catch these errors using the usual try/except logic:
from crosscontract.crossclient.exceptions import ConflictError, PermissionDeniedError
contract = CrossContract(**test_contract)
try:
created_contract = my_client.contracts.create(contract, activate=True)
except (ConflictError, PermissionDeniedError) as e:
# catch the expected errors here
print(f"Expected error creating contract: {e}")
except Exception as e:
# but raise any unexpected errors
raise e
Expected error creating contract: ConflictError (409): Contract with name 'test_contract' already exists.
Getting a contract¶
Let's now get our test_contract.
To get the contract we use client.contracts.get. If the contract is found, we
will get back a ContractResource. A ContractResource is a CrossContract that
lives on the CrossPlatform. As the contract is saved on the CrossPlatform the contract
is read-only and also provides some additional information like the status of
the contract on the platform.
The ContractResource is the central object to work with remote contracts and
allows you to get, add, and delete data for a contract.
contract_name = test_contract["name"]
my_contract_resource = my_client.contracts.get(name=contract_name)
print(f"Retrieved contract resource: {my_contract_resource}")
Retrieved contract resource: ContractResource(name=test_contract, status=Active)
The ContractResource contains the contract but usually we do not want to deal with
it directly but only want to validate our local data or add or get data from the
platform:
Validate local data¶
Validation of data follows the exactly same steps as in the CrossContract case.
We simply use validate_dataframe function with our data given as Pandas Dataframe.
df_test = pd.DataFrame({
"year": [2020, 2021, 2022],
"country": ["US", "CA", "MX"],
"value": [50.5, 60.0, 70.2]
})
my_contract_resource.validate_dataframe(df_test)
If nothing happens, the data is locally valid. However in the case of validation errors,
validate_dataframe will raise an ValidationError. To get more information about
which data violating the contract, we can catch the error and use the to_df function
to get a dataframe with detailed error messages by row:
from crosscontract.crossclient.exceptions import ValidationError
df_fail = pd.DataFrame({
"year": [1820, 2021, 2022],
"country": ["US", "CA", "ThisCountryNameIsWayTooLong"],
"value": [50.5, 100000, 70.2]
})
try:
my_contract_resource.validate_dataframe(df_fail)
except ValidationError as e:
df_errors = e.to_pandas()
df_errors
| schema_context | column | check | check_number | failure_case | index | |
|---|---|---|---|---|---|---|
| 0 | Column | year | greater_than_or_equal_to(2000) | 0 | 1820 | 0 |
| 1 | Column | value | less_than_or_equal_to(100.0) | 1 | 100000.0 | 1 |
| 2 | Column | country | str_length(2, 6) | 0 | ThisCountryNameIsWayTooLong | 2 |
There is one different in validation using the CrossContract and the ContractResource: CrossContract raises a SchemaValidationError but ContractResource raises a ValidationError. The two behave the same in terms of error details. But the ValidationError unifies validation errors that occur locally with that occur on the CrossPlatform. More on this below.
Adding data¶
To add data, we again use our ContractResource and its add_data method that does two things:
- Validate the data locally
- Submit the data to the server
df_test = pd.DataFrame({
"year": [2020, 2021, 2022],
"country": ["US", "CA", "MX"],
"value": [50.5, 60.0, 70.2]
})
my_contract_resource.add_data(df_test)
What happens if we submit the data again? The contract has a primary key constraint that restricts the combination of year and country to be unique:
# Local validation succeeds
my_contract_resource.validate_dataframe(df_test)
# but adding the same data again raises a ValidationError due to unique constraint violation
try:
my_contract_resource.add_data(df_test)
except ValidationError as e:
df_errors = e.to_pandas()
print(e)
df_errors
ValidationError (422): Data validation against contract 'test_contract' failed. To get detailed error information, catch the ValidationError and use its .to_list() or .to_pandas() methods.
| schema_context | column | check | check_number | failure_case | index | |
|---|---|---|---|---|---|---|
| 0 | DataFrameSchema | year, country | PrimaryKeyError: Primary key ['year', 'country... | 0 | [2020, US] | 0 |
| 1 | DataFrameSchema | year, country | PrimaryKeyError: Primary key ['year', 'country... | 0 | [2021, CA] | 1 |
| 2 | DataFrameSchema | year, country | PrimaryKeyError: Primary key ['year', 'country... | 0 | [2022, MX] | 2 |
So the local validation passes but the server raises an validation error. This illustrates the two concepts of validity in the context of the CrossClient:
- Local validity Data are locally consistent. But we do not check whether our local data are consistent with the server.
- Remote validity When we submit data to the CrossPlatform checks the new data together with the data already stored in the platform. In the case of foreign key references, the platform also tries to resolve and check these references.
The difference between local and remote validity mostly matters in two cases: (a) Resolving of uniqueness constrains as the local data point already exists on the server. (b) Resolving of foreign key constraints, i.e., the data contain a reference to data in another contract and the respective value is not found in that other contract.
Getting data¶
To get data back from the platform, we use the ContractResource and its get_data
method. The method allows to impose a simple filter on the data.
Currently, filtering is however restricted to string values that are scalar. I.e., filtering on numerical data or using lists is is not possible at the moment:
df_data = my_contract_resource.get_data()
# or for illustration purposes with a filter
my_contract_resource.get_data(filters={"country": "US"})
| value | year | country | |
|---|---|---|---|
| 0 | 50.5 | 2020 | US |
Deleting the contract¶
Deleting the contract, requires multiple steps:
- Change the contract status to "retired"
- Drop the data table associated with the contract. That deletes all data and is only possible for contracts that are in state retired.
- Use the client to delete the contract.
my_contract_resource.change_status("retired")
my_contract_resource.drop_data()
my_client.contracts.delete(name=contract_name)
Close the client¶
After using the client, you should close the connection properly:
my_client.close()