Derived data products from GBIF snapshots
Derived data products from GBIF snapshots. Currently, focuses on providing H3-indices of GBIF.H3 Georeferenced GBIF Snapshots
This repository contains snapshots from the Global Biodiversity Information Facility, the world's largest registry of biodiversity occurrence data with over 3 billion occurrence records.
Using source.coop makes it possible to share modifications or enhancements based on this data that would be challenging to redistribute otherwise. This repository demonstrates one such enhancement by extending the data with H3's heirarchical hexagonal spatial index keys.
These snapshots are georeferenced into geoparquet format with H3 spatial indexes for hex resolutions 0 - 11 using the h3 extension for duckdb (script here). Pre-processing can take over 24hrs of computational time. The resulting snapshot is over 430 GB in partitioned parquet.
Using duckdb and these additional h3 we can then rapidly compute spatial aggregations based on the GBIF across these 11 orders of magnitude. Using the source-coop S3 interface, we can also stream the aggregated data back to source.coop to serve custom maps computed on the fly.
Example maps
Citation
Global Biodiversity Information Facility (GBIF) Species Occurrences was accessed on 2024-10-01. For more information on how to cite GBIF datasets, please refer to the GBIF citation guidelines.