Energy Performance Certificate Ratings (Domestic) - England and Wales

An EPC provides an indication of how much it will cost to heat and power a property, as well as how much CO2 it emits. It also includes recommendations of energy-efficient improvements, the cost of carrying them out, and the potential savings in pounds and pence that each one could generate.
Product Details
Visibility
Public
Created
3 Oct 2023
Last Updated
3 Apr 2025
README

Energy Performance Certificates (England and Wales)

Description

This dataset is a copy of the Energy Performance Certificates for England and Wales made available in a cloud-native geospatial geoparquet format. The original dataset is distributed as a zipped collection CSVs, and is available for download from here.

Accessing the Data

The data has been made available as admin-partitioned geoparquet.You can query the entire dataset using DuckDB like so.

1duckdb -c "Select count(*) from read_parquet('s3://us-west-2.opendata.source.coop/addresscloud/epc/geoparquet-local-authority/*.parquet');"
1duckdb -c "Select count(*) from read_parquet('s3://us-west-2.opendata.source.coop/addresscloud/epc/geoparquet-local-authority/*.parquet');"

Alternatively, you can use the aws cli to access the data directly from the S3 bucket:

1aws s3 --no-sign-request ls s3://us-west-2.opendata.source.coop/addresscloud/epc/
1aws s3 --no-sign-request ls s3://us-west-2.opendata.source.coop/addresscloud/epc/

Or if you want the data for just one local authority then choose one from the browse section of this page.

The geoparquet format is compliant with GDAL 3.5 onwards and readable in QGIS and many other platforms. See the geoparquet website for more info on this

Data extraction process

These steps should help you emulate the process of extracting the EPC The data source data exists here. You will need to sign up. Once doing this you should find the data avaible to download as a ZIP. This ZIP contains a directory for each local authortiy in England and Wales. I've created this script to extract just the certificates.csv from each directory.

1#!/bin/bash
2# Zip file name in the current directory
3zip_file="all-domestic-certificates.zip"
4# Loop through the zip file contents and extract certificates.csv
5unzip -l "$zip_file" | grep certificates.csv | while read -r line; do
6file_name=$(echo "$line" | awk '{print $NF}')
7folder_name=$(dirname "$file_name")
8# Extract the specific file and save it with the folder name as filename
9unzip -p "$zip_file" "$file_name" > "epc_csv/${folder_name//\//-}.csv"
10done
1#!/bin/bash
2# Zip file name in the current directory
3zip_file="all-domestic-certificates.zip"
4# Loop through the zip file contents and extract certificates.csv
5unzip -l "$zip_file" | grep certificates.csv | while read -r line; do
6file_name=$(echo "$line" | awk '{print $NF}')
7folder_name=$(dirname "$file_name")
8# Extract the specific file and save it with the folder name as filename
9unzip -p "$zip_file" "$file_name" > "epc_csv/${folder_name//\//-}.csv"
10done

Data conversion

The next steps involve using the DuckDB client to load the data, geocoding each row by joining to the OS Open UPRN dataset and then exporting the entire dataset to admin-partitioned parquet files. Then we'll need to convert these to geoparquet using gpq tool. There are 347 CSVs totalling 23.4 GB in size. You can convert the data to parquet using various tools. Once you've done that the total combined file size of the Parquet files is only 3.7 GB. The steps below

1$ duckdb
1$ duckdb
1create or replace view epc_certs as select * from read_parquet('epc/*.parquet');
1create or replace view epc_certs as select * from read_parquet('epc/*.parquet');
1create or replace view opuprn as select * from read_parquet('opuprn.parquet');
1create or replace view opuprn as select * from read_parquet('opuprn.parquet');
1create or replace view epc as
2select epc_certs.*,
3opuprn.LATITUDE as lat,
4opuprn.LONGITUDE as lon
5from epc_certs
6join opuprn ON epc_certs.uprn = opuprn.uprn::text;
1create or replace view epc as
2select epc_certs.*,
3opuprn.LATITUDE as lat,
4opuprn.LONGITUDE as lon
5from epc_certs
6join opuprn ON epc_certs.uprn = opuprn.uprn::text;

Once all the views have been created we can then export to partionied parquet files. The partition used here is the local authority name.

1install spatial;
2load spatial;
3COPY (Select UPRN as uprn,
4LOCAL_AUTHORITY as local_authority,
5LOCAL_AUTHORITY_LABEL as local_authority_label,
6CURRENT_ENERGY_RATING as current_energy_rating,
7POTENTIAL_ENERGY_RATING as potential_rating,
8INSPECTION_DATE as inspection_date,
9ST_AsText(ST_Point(lon::double,lat::double)) as geometry
10from epc)
11TO 'epc_partitioned' (FORMAT PARQUET, PARTITION_BY (LOCAL_AUTHORITY_LABEL));
1install spatial;
2load spatial;
3COPY (Select UPRN as uprn,
4LOCAL_AUTHORITY as local_authority,
5LOCAL_AUTHORITY_LABEL as local_authority_label,
6CURRENT_ENERGY_RATING as current_energy_rating,
7POTENTIAL_ENERGY_RATING as potential_rating,
8INSPECTION_DATE as inspection_date,
9ST_AsText(ST_Point(lon::double,lat::double)) as geometry
10from epc)
11TO 'epc_partitioned' (FORMAT PARQUET, PARTITION_BY (LOCAL_AUTHORITY_LABEL));

Converting to Geoparquet

It is currently not possible to export to geoparquet using DuckDB so to convert all the outputted parquet to geoparquet and remove the parquet files I created this script

1#!/bin/bash
2shopt -s globstar
3for d in ./**/*.parquet
4do
5dir="${d%/*}" # Strip the *.parquet pathname back to the containing directory
6#echo $dir
7parquet="${d##*/}" # Strip the *.parquet pathname back to just the filename
8new_name=$(echo "$dir" | tr ' ' _ |cut -d'=' -f2)
9#echo "Converted $parquet to geoparquet" >&2
10(
11cd "$dir" || exit
12find . -type f -name "*.parquet" -size -100b -delete
13if [ -f $parquet ]
14then
15gpq convert $parquet $new_name.parquet
16fi
17)
18done
19for d in ./**/*.parquet
20do find . -type f -name "data*" -delete;
21done
1#!/bin/bash
2shopt -s globstar
3for d in ./**/*.parquet
4do
5dir="${d%/*}" # Strip the *.parquet pathname back to the containing directory
6#echo $dir
7parquet="${d##*/}" # Strip the *.parquet pathname back to just the filename
8new_name=$(echo "$dir" | tr ' ' _ |cut -d'=' -f2)
9#echo "Converted $parquet to geoparquet" >&2
10(
11cd "$dir" || exit
12find . -type f -name "*.parquet" -size -100b -delete
13if [ -f $parquet ]
14then
15gpq convert $parquet $new_name.parquet
16fi
17)
18done
19for d in ./**/*.parquet
20do find . -type f -name "data*" -delete;
21done

Licence

Date is licensed under the Open Government Licence v3.0. More details can be found here.

Contact

This is an experimental dataset and I hope to add more attributes in the future. If you have any questions about the process you can contact me using matt@addresscloud.com

Source Cooperative is a Radiant Earth project