Fields of The World (FTW)

Fields of The World (FTW) is a comprehensive benchmark dataset designed to enhance the development of machine learning models for instance segmentation of agricultural field boundaries. This dataset aims to meet the growing need for accurate and scalable field boundary data for global agricultural monitoring and assessments.
Product Details
Visibility
Public
Created
27 Aug 2024
Last Updated
3 Apr 2025
Product Contents
root
README

Version : v1.0.0

Description

Fields of The World (FTW) is a comprehensive benchmark dataset designed to enhance the development of machine learning models for instance segmentation of agricultural field boundaries. This dataset aims to meet the growing need for accurate and scalable field boundary data for global agricultural monitoring and assessments.

Key Features:

  1. Near-Global Coverage: FTW spans four continents—Europe, Africa, Asia, and South America—covering diverse agricultural landscapes across 24 countries. This extensive geographic coverage allows for the development of models that can generalize well to different agricultural practices and field types.

  2. Large-Scale Dataset: With approximately 1.6 million parcel boundaries and over 70,000 samples, FTW is significantly larger than previously available datasets. Each sample includes instance and semantic segmentation masks paired with multi-date, multi-spectral Sentinel-2 satellite images, enabling detailed temporal and spectral analysis.

  3. Multi-class Segmentation: The dataset provides masks for both instance segmentation and semantic segmentation with different classes, including:

    • Instance Segmentation Masks: To identify individual fields.
    • Semantic Segmentation Masks:
      • Two-class masks: Background and polygon (field).
      • Three-class masks: Background, polygon (field), and boundaries.
  4. Spectral Richness: The dataset includes RGB (Red, Green, Blue) and NIR (Near-Infrared) spectral bands from Sentinel-2 images.

  5. Temporal Richness: The dataset includes multi-date imagery to capture different stages of the growing season. Two images with distinct contrast differences were selected to represent these stages. To determine the date ranges for these images, the USDA Crop Calendar was initially referenced and then refined by selecting periods with minimal cloud cover and optimal contrast between the two images.

  6. Comprehensive Data Splits: The dataset is carefully divided into training, validation, and test sets to ensure accurate evaluation of model performance. For each country, larger tiles are divided into smaller chips measuring 1536x1536 m². To prevent data leakage due to spatial autocorrelation, a blocked random splitting strategy is used. Chips are grouped into 3×3 blocks, with 80% allocated to training, 10% to validation, and 10% to testing.

  7. Metadata and Documentation: The metadata and documentation provide crucial information to help users effectively interpret and utilize the dataset. It includes key details about the country of focus, temporal data collection windows, grid structures, and the year of collection.

    1. Country: The geographic region the dataset focuses on.
    2. Crop Types: Crop types used to filter the polygons in the dataset, with specific keywords used for filtering.
    3. Seasons: Date ranges defining the temporal windows for data collection.
    4. Year of Collection: The year when the polygon boundaries were captured.
    5. Grids: Larger grids from which smaller chips are derived, with some grids spatially separated to optimize area coverage.

Dataset Directory Structure

Fields of The World
├── README.md                      -> This File
├── ftw-sources.pmtiles            -> File to visualize the source field boundaries on https://fieldsofthe.world/map/
├── austria                        -> Country Folder
│   ├── label_masks                -> Labels Folder
│   │   ├── instance               -> Instance Segmented Masks (Label) (Masks in .tif Format)
│   │   ├── semantic_2class        -> Semantic Segmented Masks (Label) (Masks in .tif Format) -> Contains 2 Classes (0-Background, 1-Polygon)
│   │   └── semantic_3class        -> Semantic Segmented Masks (Label) (Masks in .tif Format) -> Contains 3 Classes (0-Background, 1-Polygon, 2-Boundaries)
│   ├── s2_images                  -> Images Folder (Contains image chips)
│   │   ├── window_a               -> Window A images (Images in .tif Format)
│   │   └── window_b               -> Window B images (Images in .tif Format)
│   ├── chips_austria.parquet      -> Chips file in geoparquet format, contains split details (Each chips belongs in one of Train/Val/Test split)
│   └── data_config_austria.json   -> Contains meta data about the bigger grids for the dataset, crop types, dates for temporal windows.
├── austria.zip                    -> Country Zip Folder, this contains all the files in the country directory.
└── checksum.md5                   -> Checksum MD5 file containing all the individual files checksum hashes. 
..... Continues for all the countries in the same format.

Dataset Information

CountryYear of ValidityParcel CountsChipsTrain SplitValidation SplitTest SplitSource PolygonsSource Data License
Austria202119610166865304637745LinkCC-BY-4.0
Belgium20216343119411554189198LinkNo restrictions on public access
Brazil2020185416071289130188LinkCC-BY-4.0
Cambodia20213180883442743634LinkCC-BY-4.0
Corsica2021536024721974240258LinkCC-BY-2.0
Croatia202315748134822778351353LinkOpen Data
Denmark20213767735602868360332LinkCC0-1.0
Estonia20212669567135348681684LinkCC-3.0
Finland20215732356654527550588LinkCC-BY-4.0
France20205534237442988360396LinkOpen Licence
Germany2018/2019459868630630350LinkDL-DE/BY-2-0
India2016100132002*1281300399LinkCC-BY-4.0
Kenya20228743913162055LinkGPL-2.0-or-later
Latvia20214496469385529668741LinkCC-BY-NC-4.0
Lithuania20216142452584208522528LinkNon-commercial use only
Luxembourg2022290188086438184LinkCC0-1.0
Netherlands20224316938793110381388LinkCC0-1.0
Portugal2021504086641210LinkCC-BY-NC-4.0
Rwanda20211532705767LinkCC-BY-4.0
Slovakia20211424240733275390408LinkCC0-1.0
Slovenia20216748821771733216228LinkCC-BY-4.0
South Africa201865687475907285LinkCC-BY-NC-SA-4.0
Spain202025846524402019202219LinkCC-BY-4.0
Sweden20213971847603802442516LinkNo restrictions on public access
Vietnam20211209132882293623LinkCC-BY-4.0

*India has a total of 2,002 chips available. Of these, 22 chips are marked as 'none' for the split column as per the original data curator. Thus, 1,980 chips have been used in the train/validation/test splits in India.

Source Cooperative is a Radiant Earth project