Saved in:
Bibliographic Details
Main Authors: Stasinos, Stylianos, Mensio, Martino, Lazovik, Elena, Trantas, Athanasios
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.11568
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908608197820416
author Stasinos, Stylianos
Mensio, Martino
Lazovik, Elena
Trantas, Athanasios
author_facet Stasinos, Stylianos
Mensio, Martino
Lazovik, Elena
Trantas, Athanasios
contents Biodiversity research requires complete and detailed information to study ecosystem dynamics at different scales. Employing data-driven methods like Machine Learning is getting traction in ecology and more specific biodiversity, offering alternative modelling pathways. For these methods to deliver accurate results there is the need for large, curated and multimodal datasets that offer granular spatial and temporal resolutions. In this work, we introduce BioCube, a multimodal, fine-grained global dataset for ecology and biodiversity research. BioCube incorporates species observations through images, audio recordings and descriptions, environmental DNA, vegetation indices, agricultural, forest, land indicators, and high-resolution climate variables. All observations are geospatially aligned under the WGS84 geodetic system, spanning from 2000 to 2020. The dataset is available at https://huggingface.co/datasets/ BioDT/BioCube, the acquisition and processing code base at https://github.com/BioDT/bfm-data.
format Preprint
id arxiv_https___arxiv_org_abs_2505_11568
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle BioCube: A Multimodal Dataset for Biodiversity Research
Stasinos, Stylianos
Mensio, Martino
Lazovik, Elena
Trantas, Athanasios
Quantitative Methods
Artificial Intelligence
Machine Learning
Biodiversity research requires complete and detailed information to study ecosystem dynamics at different scales. Employing data-driven methods like Machine Learning is getting traction in ecology and more specific biodiversity, offering alternative modelling pathways. For these methods to deliver accurate results there is the need for large, curated and multimodal datasets that offer granular spatial and temporal resolutions. In this work, we introduce BioCube, a multimodal, fine-grained global dataset for ecology and biodiversity research. BioCube incorporates species observations through images, audio recordings and descriptions, environmental DNA, vegetation indices, agricultural, forest, land indicators, and high-resolution climate variables. All observations are geospatially aligned under the WGS84 geodetic system, spanning from 2000 to 2020. The dataset is available at https://huggingface.co/datasets/ BioDT/BioCube, the acquisition and processing code base at https://github.com/BioDT/bfm-data.
title BioCube: A Multimodal Dataset for Biodiversity Research
topic Quantitative Methods
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2505.11568