Saved in:
| Main Authors: | Gan, Lisa-Yao, Das, Arunav, Walker, Johanna, Diepold, Klaus, Simperl, Elena |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2606.02334 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Keywords are not always the key: A metadata field analysis for natural language search on open data portals
by: Gan, Lisa-Yao, et al.
Published: (2025)
by: Gan, Lisa-Yao, et al.
Published: (2025)
Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version
by: Miao, Hao, et al.
Published: (2024)
by: Miao, Hao, et al.
Published: (2024)
Prompting Datasets: Data Discovery with Conversational Agents
by: Walker, Johanna, et al.
Published: (2023)
by: Walker, Johanna, et al.
Published: (2023)
AutoDDG: Automated Dataset Description Generation using Large Language Models
by: Zhang, Haoxiang, et al.
Published: (2025)
by: Zhang, Haoxiang, et al.
Published: (2025)
AI data transparency: an exploration through the lens of AI incidents
by: Worth, Sophia, et al.
Published: (2024)
by: Worth, Sophia, et al.
Published: (2024)
Smaller and More Flexible Cuckoo Filters
by: Schmitz, Johanna Elena, et al.
Published: (2025)
by: Schmitz, Johanna Elena, et al.
Published: (2025)
LAKEGEN: A LLM-based Tabular Corpus Generator for Evaluating Dataset Discovery in Data Lakes
by: Dai, Zhenwei, et al.
Published: (2025)
by: Dai, Zhenwei, et al.
Published: (2025)
Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL
by: Chung, Yeounoh, et al.
Published: (2025)
by: Chung, Yeounoh, et al.
Published: (2025)
CogPic: A Multimodal Dataset for Early Cognitive Impairment Assessment via Picture Description Tasks
by: Wu, Liuyu, et al.
Published: (2026)
by: Wu, Liuyu, et al.
Published: (2026)
Filling in the Blanks? A Systematic Review and Theoretical Conceptualisation for Measuring WikiData Content Gaps
by: Ripoll, Marisa, et al.
Published: (2025)
by: Ripoll, Marisa, et al.
Published: (2025)
A Standardized Machine-readable Dataset Documentation Format for Responsible AI
by: Jain, Nitisha, et al.
Published: (2024)
by: Jain, Nitisha, et al.
Published: (2024)
FlexiDataGen: An Adaptive LLM Framework for Dynamic Semantic Dataset Generation in Sensitive Domains
by: Jelodar, Hamed, et al.
Published: (2025)
by: Jelodar, Hamed, et al.
Published: (2025)
Generating Skyline Datasets for Data Science Models
by: Wang, Mengying, et al.
Published: (2025)
by: Wang, Mengying, et al.
Published: (2025)
Croissant: A Metadata Format for ML-Ready Datasets
by: Akhtar, Mubashara, et al.
Published: (2024)
by: Akhtar, Mubashara, et al.
Published: (2024)
A Survey on Open Dataset Search in the LLM Era: Retrospectives and Perspectives
by: Li, Pengyue, et al.
Published: (2025)
by: Li, Pengyue, et al.
Published: (2025)
Distinctiveness Maximization in Datasets Assemblage
by: Wang, Tingting, et al.
Published: (2024)
by: Wang, Tingting, et al.
Published: (2024)
TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous
by: Zhang, Ruilin, et al.
Published: (2026)
by: Zhang, Ruilin, et al.
Published: (2026)
ChatPD: An LLM-driven Paper-Dataset Networking System
by: Xu, Anjie, et al.
Published: (2025)
by: Xu, Anjie, et al.
Published: (2025)
Dataset Discovery via Line Charts
by: Ji, Daomin, et al.
Published: (2024)
by: Ji, Daomin, et al.
Published: (2024)
User Experience In Dataset Search
by: Zhao, Yihang, et al.
Published: (2024)
by: Zhao, Yihang, et al.
Published: (2024)
LLMClean: Context-Aware Tabular Data Cleaning via LLM-Generated OFDs
by: Biester, Fabian, et al.
Published: (2024)
by: Biester, Fabian, et al.
Published: (2024)
Query Based Construction of Chronic Disease Datasets
by: Ngo, Vuong M., et al.
Published: (2024)
by: Ngo, Vuong M., et al.
Published: (2024)
Croissant Baker: Metadata Generation for Discoverable, Governable, and Reusable ML Datasets
by: Attrach, Rafi Al, et al.
Published: (2026)
by: Attrach, Rafi Al, et al.
Published: (2026)
SchemaDB: Structures in Relational Datasets
by: Christopher, Cody James, et al.
Published: (2021)
by: Christopher, Cody James, et al.
Published: (2021)
DataLens: Enhancing Dataset Discovery via Network Topologies
by: Ollagnier, Anaïs, et al.
Published: (2025)
by: Ollagnier, Anaïs, et al.
Published: (2025)
The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification
by: Tihanyi, Norbert, et al.
Published: (2023)
by: Tihanyi, Norbert, et al.
Published: (2023)
Context-Enriched Natural Language Descriptions of Vessel Trajectories
by: Patroumpas, Kostas, et al.
Published: (2026)
by: Patroumpas, Kostas, et al.
Published: (2026)
Conceptual Schema Inference for Tabular Datasets using Large Language Models
by: Wu, Zhenyu, et al.
Published: (2026)
by: Wu, Zhenyu, et al.
Published: (2026)
An Intelligent Innovation Dataset on Scientific Research Outcomes
by: Wu, Xinran, et al.
Published: (2024)
by: Wu, Xinran, et al.
Published: (2024)
Within-Dataset Disclosure Risk for Differential Privacy
by: Zhu, Zhiru, et al.
Published: (2023)
by: Zhu, Zhiru, et al.
Published: (2023)
A Unified Approach for Multi-Granularity Search over Spatial Datasets
by: Yang, Wenzhe, et al.
Published: (2024)
by: Yang, Wenzhe, et al.
Published: (2024)
Jelly-Patch: a Fast Format for Recording Changes in RDF Datasets
by: Sowinski, Piotr, et al.
Published: (2025)
by: Sowinski, Piotr, et al.
Published: (2025)
From RDF Graph Validation to RDF Dataset Validation with SHACL-DS
by: Dao, Davan Chiem, et al.
Published: (2025)
by: Dao, Davan Chiem, et al.
Published: (2025)
jXBW: Fast Substructure Search for Large-Scale JSONL Datasets with LLM Applications
by: Tabei, Yasuo
Published: (2025)
by: Tabei, Yasuo
Published: (2025)
PBE Meets LLM: When Few Examples Aren't Few-Shot Enough
by: Zhang, Shuning, et al.
Published: (2025)
by: Zhang, Shuning, et al.
Published: (2025)
When Less is More: The LLM Scaling Paradox in Context Compression
by: Guo, Ruishan, et al.
Published: (2026)
by: Guo, Ruishan, et al.
Published: (2026)
Global Dataset of Solar Power Plants: Multidimensional Integration and Analysis
by: Mantilla-Guerra, Anibal, et al.
Published: (2026)
by: Mantilla-Guerra, Anibal, et al.
Published: (2026)
Blocked Bloom Filters with Choices
by: Schmitz, Johanna Elena, et al.
Published: (2025)
by: Schmitz, Johanna Elena, et al.
Published: (2025)
Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets
by: Bohacek, Matyas, et al.
Published: (2025)
by: Bohacek, Matyas, et al.
Published: (2025)
OSM+: Billion-Level OpenStreetMap Dataset for City-wide Experiments
by: Zheng, Guanjie, et al.
Published: (2025)
by: Zheng, Guanjie, et al.
Published: (2025)
Similar Items
-
Keywords are not always the key: A metadata field analysis for natural language search on open data portals
by: Gan, Lisa-Yao, et al.
Published: (2025) -
Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version
by: Miao, Hao, et al.
Published: (2024) -
Prompting Datasets: Data Discovery with Conversational Agents
by: Walker, Johanna, et al.
Published: (2023) -
AutoDDG: Automated Dataset Description Generation using Large Language Models
by: Zhang, Haoxiang, et al.
Published: (2025) -
AI data transparency: an exploration through the lens of AI incidents
by: Worth, Sophia, et al.
Published: (2024)