Forecasts submitted to the Ecological Forecasting Initiative's NEON Challenge
In development, experimental
At this time, canonical sources are still hosted on data.ecoforecast.org
.
Resources
Quickstart
Arrow provides an easy way to access remote parquet files from most languages widely used in data science. Here we access all forecasts submitted to a particular theme. (Users looking to load only a single model should specify that on the path for faster access. The STAC catalog can be used to explore available models).
The examples below show 'cloud-native' connections to the data -- 'lazy' connections that do not download the entire asset, but allow us to filter, subset, and operate directly on the remote data product.
R Access
1library(arrow)
2base = "s3://anonymous@us-west-2.opendata.source.coop"
3repo = "eco4cast/neon4cast-forecasts"
4theme = "aquatics"
5uri = glue::glue("{base}/{repo}/parquet/{theme}?region=us-west-2")
6
7open_dataset(uri)
8
1library(arrow)
2base = "s3://anonymous@us-west-2.opendata.source.coop"
3repo = "eco4cast/neon4cast-forecasts"
4theme = "aquatics"
5uri = glue::glue("{base}/{repo}/parquet/{theme}?region=us-west-2")
6
7open_dataset(uri)
8
Python Access
1import pyarrow.dataset as ds
2
3base = "s3://anonymous@us-west-2.opendata.source.coop"
4repo = "eco4cast/neon4cast-forecasts"
5theme = "aquatics"
6uri = f"{base}/{repo}/parquet/{theme}?region=us-west-2"
7
8ds.dataset(uri, format="parquet")
1import pyarrow.dataset as ds
2
3base = "s3://anonymous@us-west-2.opendata.source.coop"
4repo = "eco4cast/neon4cast-forecasts"
5theme = "aquatics"
6uri = f"{base}/{repo}/parquet/{theme}?region=us-west-2"
7
8ds.dataset(uri, format="parquet")
duckdb
At this time, duckdb
access substantially faster than arrow
.
R + duckdb
R users can get a dplyr-compatible lazy remote tibble as follows:
1# remotes::install_github("cboettig/duckdbfs")
2library(duckdbfs)
3
4base = "s3://anonymous@us-west-2.opendata.source.coop"
5repo = "eco4cast/neon4cast-forecasts"
6theme = "aquatics"
7uri = glue::glue("{base}/{repo}/parquet/{theme}?region=us-west-2")
8
9df = open_dataset(uri)
1# remotes::install_github("cboettig/duckdbfs")
2library(duckdbfs)
3
4base = "s3://anonymous@us-west-2.opendata.source.coop"
5repo = "eco4cast/neon4cast-forecasts"
6theme = "aquatics"
7uri = glue::glue("{base}/{repo}/parquet/{theme}?region=us-west-2")
8
9df = open_dataset(uri)
Python + duckdb
ibis provides a more Pythonic interface to SQL:
1import ibis
2con = ibis.duckdb.connect()
3
4base = "s3://us-west-2.opendata.source.coop"
5repo = "eco4cast/neon4cast-forecasts"
6theme = "aquatics"
7uri = f"{base}/{repo}/parquet/{theme}/**"
8
9con.raw_sql(f"""
10INSTALL httpfs;
11LOAD httpfs;
12SET s3_region='us-west-2';
13""")
14
15db = con.read_parquet(uri)
1import ibis
2con = ibis.duckdb.connect()
3
4base = "s3://us-west-2.opendata.source.coop"
5repo = "eco4cast/neon4cast-forecasts"
6theme = "aquatics"
7uri = f"{base}/{repo}/parquet/{theme}/**"
8
9con.raw_sql(f"""
10INSTALL httpfs;
11LOAD httpfs;
12SET s3_region='us-west-2';
13""")
14
15db = con.read_parquet(uri)