Derived data products from GBIF snapshots

Derived data products from GBIF snapshots. Currently, focuses on providing H3-indices of GBIF.
Product Details
Visibility
Public
Created
14 Jul 2024
Last Updated
3 Apr 2025
README

H3 Georeferenced GBIF Snapshots

This repository contains snapshots from the Global Biodiversity Information Facility, the world's largest registry of biodiversity occurrence data with over 3 billion occurrence records.

Using source.coop makes it possible to share modifications or enhancements based on this data that would be challenging to redistribute otherwise. This repository demonstrates one such enhancement by extending the data with H3's heirarchical hexagonal spatial index keys.

These snapshots are georeferenced into geoparquet format with H3 spatial indexes for hex resolutions 0 - 11 using the h3 extension for duckdb (script here). Pre-processing can take over 24hrs of computational time. The resulting snapshot is over 430 GB in partitioned parquet.

Using duckdb and these additional h3 we can then rapidly compute spatial aggregations based on the GBIF across these 11 orders of magnitude. Using the source-coop S3 interface, we can also stream the aggregated data back to source.coop to serve custom maps computed on the fly.

Example maps

Citation

Global Biodiversity Information Facility (GBIF) Species Occurrences was accessed on 2024-10-01. For more information on how to cite GBIF datasets, please refer to the GBIF citation guidelines.

Source Cooperative is a Radiant Earth project