Data Version Guide
Climatiq implements two versioning systems: API Versioning and Data Versioning. API versioning focuses on which parameters the API accepts and returns, and how these may change over time. Data Versioning, which we’ll focus on here, deals with the changes of the underlying emission factor data.
We also have a shorter reference page if you need to select a version to use and don't need the level of understanding offered by this guide.
We have simplified how we handle data versioning based on user feedback. The previous format used a two number system: major.minor
, and was usually seen with the same data release number for major and minor, e.g. 16.16
.
For backwards compatibility reasons, this format still works, but we recommend using the new version format described in this guide.
Some deprecated fields in the API still return the old version format.
How Climatiq Changes Data and Why
The Climatiq emission factor database is continuously updated for many reasons, such as when:
- New emission factors are added.
- Existing emission factors are changed, such as when a source publishes corrections.
- Existing emission factors are changed when Climatiq performs internal restructuring by changing metadata such as
activity_id
, to provide additional context about the activity scope. - Emission factors are deleted when a source deprecates the dataset, or our science team deprecates factors because of their poor quality. This happens rarely.
Climatiq bundles these changes up into data versions. A new data version is generally released each month, and has an ascending release number, such as 1
, 2
, 3
, etc.
With each new data release, we provide a changelog of which factors were added, changed or removed. You can find our data release changelogs here.
Climatiq handles these monthly updates and addition of emission factors without disrupting existing applications with the concept of Data Versioning.
Data Versioning
When using most of the Climatiq endpoints, such as freight, energy, travel, autopilot etc, data versioning is automatically handled for you.
Some endpoints allow you to query and estimate directly against the Climatiq Emission Factor Database - such as the “Search” and “Basic Estimate” endpoints. To use these endpoints you must specify a data version.
Climatiq allows you to select two different types of data version depending on your requirements: A fixed data version, or a dynamic data version.
Fixed data version
A fixed data version gives you a fixed, unchanging view of the underlying emission factor data.
When new emission factors are added, or existing emission factors are updated or removed, nothing changes for you, as your view of the data is fixed.
The advantage of using a fixed data version is stability - you know the emission factors you are using will not change unless you specifically take action to change it. You should select this type of data version when you want stability in the emission factors you use and if you’re okay with using older, and potentially inaccurate emission factors.
You use a fixed data version by supplying a data version number such as 1
, 2,
13
, etc.
Dynamic data version
A dynamic data version gives you a dynamic view of the data, where both new and old emission factors are available.
When an emission factor is updated and you are using a dynamic data version, the newest version of the emission factor will be used if it still satisfies your query, otherwise you will continue to use the old existing emission factor. See examples of both cases below.
Let’s see an example where a dynamic data version automatically delivers an updated emission factor to you. You have provided a selector to the basic estimate endpoint which looks like this:
{ "activity_id": "consumer_goods-type_dairy_products", "region": "DE", "data_version": "^8"}
This finds the following emission factor:
{ "name": "Dairy products", "activity_id": "consumer_goods-type_dairy_products", "id": "2e2545eb-93b3-44cf-bc6c-11469619247b", "year": 2019, // ... more fields here}
In a later data version, the value for "year" was corrected from '2019' to '2018'.
As your query does not include any filters on year, the updated emission factor still satisfies your query, and it is used instead. You will receive the following emission factor.
{ "name": "Dairy products", "activity_id": "consumer_goods-type_dairy_products", "id": "3676a91b-ddcb-4a84-8b6f-ef18cb29a926", "year": 2018, // ... more fields here}
Let’s see an example where the activity ID has changed. Perhaps it has changed to clarify that the above emission factor is dairy products, but not including cheese. The activity ID would then have changed from consumer_goods-type_dairy_products
to consumer_goods-type_dairy_products_excluding_cheese
.
The updated emission factor does not satisfy your query anymore, as the activity ID does not match what you specified. In a situation like this, you will continue to use the existing non-updated version of the emission factor.
In this case, you would end up with the following (unchanged) emission factor.
{ "name": "Dairy products", "activity_id": "consumer_goods-type_dairy_products", "id": "3676a91b-ddcb-4a84-8b6f-ef18cb29a926", "year": 2018,}
We refer to older versions of more recent emission factors, as stale data.
In these examples we are only considering one emission factor. In reality, there might be multiple other emission factors with the same activity ID, the same year and the same region. As In those cases, another emission factor might be used instead.
The advantage of using a dynamic data version is automatic usage of the latest emission factors, while remaining compatible with your application. You should select this type of data version if you don’t need stability in the emission factor used so that you can take advantage of updates.
You use a dynamic data version by supplying a caret in front of a data version number such as ^1
, ^2
, ^13
. Each dynamic data version will have access to the newest emission factors, and updating data versions only removes stale data.
Which data version type should I use?
We recommend using the latest dynamic version, unless you have strong requirements that the underlying emission factors remain unchanged. A requirement that data remains unchanged could e.g. be because you need to keep data identical across a reporting period, even at the cost of that data being out-of-date or wrong.
Here is a more in-depth comparison of how the different data versions handle different scenarios.
New emission factors are added | Existing emission factors are updated | Existing emission factors are deleted | Possibility of using stale data? | |
---|---|---|---|---|
Fixed Data Version | No effect | No effect | No effect | Yes, the data you see does not change, so it might be out of date. |
Dynamic Data Version | Your queries can return the new emission factors. | Your queries can return the new emission factors. If your query has to select between an older and an updated emission factor, it will select the newer. | If there is a non-deleted emission factor that satisfies your query equally well, this emission factor is used, otherwise the deleted factor continues to work. | There is less risk of using stale data, but still some, as the new data updates might be incompatible with your queries. |
Stale Data
When an emission factor has been updated, but you continue to use an older version, we call this stale data, as the data is outdated.
In most cases, using stale data is not a problem, as the majority of emission factor updates do not impact how they can be used, but are minor things like:
- Updating the link to the source
- Clarifying the scope in the description
- Fixing of typos
The risk of using stale data exists no matter what data versioning type is used.
- For fixed data versions, stale data happens naturally, as your view of the data does not change, even if emission factors are updated.
- For dynamic data versions this happens when new versions of emission factors are released, but which do not satisfy your query conditions, and thus are not used, see example 2.
The only way to ensure you’re not using stale data is to manually update your data version.
Updating data versions
To avoid stale data and to make sure that your emissions calculations are up-to-date with the latest scientific data, we recommend you update your data version periodically, and at least yearly. There are two ways you can upgrade your data version:
When using a fixed version
If you are on an fixed data version such as 8
, you are using a fixed view of the data, where no modifications, deletions or updates have happened since that data version was released.
If you want to upgrade your fixed data version, you will see an entirely new view of the data. Emission factors will have been added, changed or removed. See our Recommended upgrade process below.
When using a dynamic version
If you are on a dynamic data version such as ^8
, you already have access to the latest emission factors, but you might also be using stale data.
If you want to upgrade your dynamic data version, some stale emission factors will be removed from your responses. See our Recommended upgrade process below.
Recommended upgrade process
- Refer to the data changelog for all data releases between your currently used compatible data release and the compatible data release you are upgrading to. See if there are any changes that will impact you.
- Test your application to see if it still works as expected.
Uniquely Identifying Emission Factors
When selecting an emission factor, endpoints will generally accept either a data_version
and additional filtering, or a id
. If you are providing an id
you do not need to specify a data_version
.
This is because each version of an emission factor in the Climatiq database has a unique id (id
). Whenever an emission factor is changed in any way, this id
is changed. This means that finding an emission factor via an id
will find a specific emission factor in history, even if newer emission factors are available.
Climatiq has both the concept of an id
that uniquely defines an emission factor, disregarding
data versions, and an activity_id
that describes a particular type of activities. One emission
factor will always have a unique id
, but many emission factors, from different sources and
covering different lifecycle segments, often share the same activity_id
.
For emission factors that have not changed, the id
will not change between data releases. This means that across a set of Climatiq data releases:
- Some emission factors (say for example the UK BEIS emission factor for electric cars for 2019) might be updated three times (perhaps due to methodology updates from the source) and will have three different
id
s - Many emission factors will not have needed changes, and thus have the same
id
in each data version
Whenever Climatiq returns an id
you can use it to uniquely identify the emission factor used.