Saved in:
| Main Authors: | , |
|---|---|
| Format: | Recurso digital |
| Language: | English |
| Published: |
Zenodo
2025
|
| Online Access: | https://doi.org/10.5281/zenodo.17284002 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866901862005866496 |
|---|---|
| author | Owens, Hannah Singer, Randal A. |
| author_facet | Owens, Hannah Singer, Randal A. |
| contents | <h2>Overview</h2> <p dir="ltr">Our raw data were sourced from the Integrated Digitized Biocollections portal (iDigBio), a U.S. National Science Foundation-funded repository for the Advancing Digitization of Biodiversity Collections (ADBC) program. iDigBio houses millions of specimen records and digitized images from hundreds of U.S. based natural history collections representing taxa from throughout the tree of life. iDigBio searches return both verbatim records and records that have been harmonzied and simplified to include only simple, commonly-used text fields. For our purposes, we wanted geographic coordinates (i.e. latitude and longitude) and the verbatim depth field to complete missing depths for as many records as possible; therefore, we chose to use the raw results of our query, below, as the basis for custom harmonization.</p> <h2>Data extraction from iDigBio</h2> <h3>Database query</h3> <p dir="ltr">A query (below) was constructed in the <a href="https://www.idigbio.org/portal" target="_blank" rel="noopener">iDigBio portal</a> using Class-> Actinopterygii and excluded all freshwater results. The resulting search returned 2,984,839 raw records. Data were downloaded as .csv files. </p> <p dir="ltr">{"core_type": "records", "rq": {"family": {"type": "exists"}, "locality": {"type": "exists"}, "geopoint": {"type": "exists"}, "basisofrecord": "PreservedSpecimen", "scientificname": {"type": "exists"}, "catalognumber": {"type": "exists"}, "class": "Actinopterygii"}, "form": "dwca-csv", "core_source": "indexterms", "mediarecord_fields": null, "record_fields": null, "mq": null}</p> <h3>Museums providing data</h3> <p dir="ltr">University of Florida, Florida Museum of Natural History (UF); Natural History Museum of Los Angeles County (LACM); Museum of Comparative Zoology, Harvard University (MCZ); Oregon State University (OS); Biodiversity Research and Teaching Collections, Department of Wildlife and Fisheries Sciences, Texas A&M University (TCWC); Yale University, Peabody Museum of Natural History (YPM) Smithsonian Institution, National Museum of Natural History (USNM); Colección Ictiológica del Centro Interdisciplinario de Ciencias Marinas, Instituto Politécnico Nacional (CICIMAR-IPN); Western Australian Museum (WAM); University of Kansas Biodiversity Institute (KU); South Australian Museum (SAMA); Colección Ictiológica del Instituto de Ciencias del Mar y Limnología, Unidad Mazatlán, Universidad Nacional Autónoma de México (ICMyL-UNAM); Commonwealth Scientific & Industrial Research Organization, National Research Collections Australia, Australian National Fish Collection (CSIRO); Museums Victoria (NMV); Colección Nacional de Peces, Instituto de Biología, Universidad Nacional Autónoma de México (IBUNAM); Scripps Institution of Oceanography, Marine Vertebrate Collection, University of California San Diego (SIO); Queensland Museum, Centre for Biodiversity (QM); Muséum national d'Histoire naturelle (MNHN); Dubrovnik Natural History Museum (ZMUC/NHMD); Field Museum of Natural History (FMNH); Université d'Antananarivo, Département de Biologie Animale (UA); Zoological Museum, I. Mechnikov Odessa National University (ZMO); The Bernice Pauahi Bishop Museum, Department of Zoology (BPBM); Alabama Museum of Natural History, The University of Alabama (UAM); Georgian National Museum (GNM); University of Colorado Museum of Natural History (UCM).</p> <p dir="ltr"> </p> <h2>Data Processing</h2> <p dir="ltr">To render the data fit for use in our analysis, we first processed the raw data in R version 4.3.1 with the dplyr, stringr, and terra packages. After removing records without maximum, minimum, and/or verbatim depth information, we rectified taxonomic names by filling in missing genus information from binomial names and correcting commonly misspelled family names. Then we cleaned the maximum and minimum depth data available for each record by first converting non-metric measurements to meters. For entries without units, we assumed the depth was in meters. If maximum and/or minimum depth was missing, we attempted to extract this information from the “Verbatim Depth” field in the dataset using regular expressions and converting non-metric measurements to meters. Finally, we calculated mean depth from the maximum and minimum depth fields and removed all records at 0m or missing mean depths. We document these steps as an R Markdown file <a href="https://hannahlowens.github.io/deep-fish/RawDataCleaning">on GitHub</a>. Records with missing families were visually inspected by RAS, who corrected errors in taxonomy and added missing taxonomic information, following Eschmeyer’s Catalog of Fishes. Finally, we filtered out overland records (based on the small-scale ne_countries() dataset provided in the rnaturalearth package), leaving 354,902 records in the dataset.</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_17284002 |
| institution | Zenodo |
| language | eng |
| publishDate | 2025 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | Depth Curated Open-Ocean Fish Records from iDigBio Owens, Hannah Singer, Randal A. <h2>Overview</h2> <p dir="ltr">Our raw data were sourced from the Integrated Digitized Biocollections portal (iDigBio), a U.S. National Science Foundation-funded repository for the Advancing Digitization of Biodiversity Collections (ADBC) program. iDigBio houses millions of specimen records and digitized images from hundreds of U.S. based natural history collections representing taxa from throughout the tree of life. iDigBio searches return both verbatim records and records that have been harmonzied and simplified to include only simple, commonly-used text fields. For our purposes, we wanted geographic coordinates (i.e. latitude and longitude) and the verbatim depth field to complete missing depths for as many records as possible; therefore, we chose to use the raw results of our query, below, as the basis for custom harmonization.</p> <h2>Data extraction from iDigBio</h2> <h3>Database query</h3> <p dir="ltr">A query (below) was constructed in the <a href="https://www.idigbio.org/portal" target="_blank" rel="noopener">iDigBio portal</a> using Class-> Actinopterygii and excluded all freshwater results. The resulting search returned 2,984,839 raw records. Data were downloaded as .csv files. </p> <p dir="ltr">{"core_type": "records", "rq": {"family": {"type": "exists"}, "locality": {"type": "exists"}, "geopoint": {"type": "exists"}, "basisofrecord": "PreservedSpecimen", "scientificname": {"type": "exists"}, "catalognumber": {"type": "exists"}, "class": "Actinopterygii"}, "form": "dwca-csv", "core_source": "indexterms", "mediarecord_fields": null, "record_fields": null, "mq": null}</p> <h3>Museums providing data</h3> <p dir="ltr">University of Florida, Florida Museum of Natural History (UF); Natural History Museum of Los Angeles County (LACM); Museum of Comparative Zoology, Harvard University (MCZ); Oregon State University (OS); Biodiversity Research and Teaching Collections, Department of Wildlife and Fisheries Sciences, Texas A&M University (TCWC); Yale University, Peabody Museum of Natural History (YPM) Smithsonian Institution, National Museum of Natural History (USNM); Colección Ictiológica del Centro Interdisciplinario de Ciencias Marinas, Instituto Politécnico Nacional (CICIMAR-IPN); Western Australian Museum (WAM); University of Kansas Biodiversity Institute (KU); South Australian Museum (SAMA); Colección Ictiológica del Instituto de Ciencias del Mar y Limnología, Unidad Mazatlán, Universidad Nacional Autónoma de México (ICMyL-UNAM); Commonwealth Scientific & Industrial Research Organization, National Research Collections Australia, Australian National Fish Collection (CSIRO); Museums Victoria (NMV); Colección Nacional de Peces, Instituto de Biología, Universidad Nacional Autónoma de México (IBUNAM); Scripps Institution of Oceanography, Marine Vertebrate Collection, University of California San Diego (SIO); Queensland Museum, Centre for Biodiversity (QM); Muséum national d'Histoire naturelle (MNHN); Dubrovnik Natural History Museum (ZMUC/NHMD); Field Museum of Natural History (FMNH); Université d'Antananarivo, Département de Biologie Animale (UA); Zoological Museum, I. Mechnikov Odessa National University (ZMO); The Bernice Pauahi Bishop Museum, Department of Zoology (BPBM); Alabama Museum of Natural History, The University of Alabama (UAM); Georgian National Museum (GNM); University of Colorado Museum of Natural History (UCM).</p> <p dir="ltr"> </p> <h2>Data Processing</h2> <p dir="ltr">To render the data fit for use in our analysis, we first processed the raw data in R version 4.3.1 with the dplyr, stringr, and terra packages. After removing records without maximum, minimum, and/or verbatim depth information, we rectified taxonomic names by filling in missing genus information from binomial names and correcting commonly misspelled family names. Then we cleaned the maximum and minimum depth data available for each record by first converting non-metric measurements to meters. For entries without units, we assumed the depth was in meters. If maximum and/or minimum depth was missing, we attempted to extract this information from the “Verbatim Depth” field in the dataset using regular expressions and converting non-metric measurements to meters. Finally, we calculated mean depth from the maximum and minimum depth fields and removed all records at 0m or missing mean depths. We document these steps as an R Markdown file <a href="https://hannahlowens.github.io/deep-fish/RawDataCleaning">on GitHub</a>. Records with missing families were visually inspected by RAS, who corrected errors in taxonomy and added missing taxonomic information, following Eschmeyer’s Catalog of Fishes. Finally, we filtered out overland records (based on the small-scale ne_countries() dataset provided in the rnaturalearth package), leaving 354,902 records in the dataset.</p> |
| title | Depth Curated Open-Ocean Fish Records from iDigBio |
| url | https://doi.org/10.5281/zenodo.17284002 |