:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gan, Lisa-Yao, Das, Arunav, Walker, Johanna, Diepold, Klaus, Simperl, Elena
Format:	Preprint
Published:	2026
Subjects:	Databases
Online Access:	https://arxiv.org/abs/2606.02334
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Keywords are not always the key: A metadata field analysis for natural language search on open data portals
by: Gan, Lisa-Yao, et al.
Published: (2025)

Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version
by: Miao, Hao, et al.
Published: (2024)

Prompting Datasets: Data Discovery with Conversational Agents
by: Walker, Johanna, et al.
Published: (2023)

AutoDDG: Automated Dataset Description Generation using Large Language Models
by: Zhang, Haoxiang, et al.
Published: (2025)

AI data transparency: an exploration through the lens of AI incidents
by: Worth, Sophia, et al.
Published: (2024)

Smaller and More Flexible Cuckoo Filters
by: Schmitz, Johanna Elena, et al.
Published: (2025)

LAKEGEN: A LLM-based Tabular Corpus Generator for Evaluating Dataset Discovery in Data Lakes
by: Dai, Zhenwei, et al.
Published: (2025)

Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL
by: Chung, Yeounoh, et al.
Published: (2025)

CogPic: A Multimodal Dataset for Early Cognitive Impairment Assessment via Picture Description Tasks
by: Wu, Liuyu, et al.
Published: (2026)

Filling in the Blanks? A Systematic Review and Theoretical Conceptualisation for Measuring WikiData Content Gaps
by: Ripoll, Marisa, et al.
Published: (2025)

A Standardized Machine-readable Dataset Documentation Format for Responsible AI
by: Jain, Nitisha, et al.
Published: (2024)

FlexiDataGen: An Adaptive LLM Framework for Dynamic Semantic Dataset Generation in Sensitive Domains
by: Jelodar, Hamed, et al.
Published: (2025)

Generating Skyline Datasets for Data Science Models
by: Wang, Mengying, et al.
Published: (2025)

Croissant: A Metadata Format for ML-Ready Datasets
by: Akhtar, Mubashara, et al.
Published: (2024)

A Survey on Open Dataset Search in the LLM Era: Retrospectives and Perspectives
by: Li, Pengyue, et al.
Published: (2025)

Distinctiveness Maximization in Datasets Assemblage
by: Wang, Tingting, et al.
Published: (2024)

TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous
by: Zhang, Ruilin, et al.
Published: (2026)

ChatPD: An LLM-driven Paper-Dataset Networking System
by: Xu, Anjie, et al.
Published: (2025)

Dataset Discovery via Line Charts
by: Ji, Daomin, et al.
Published: (2024)

User Experience In Dataset Search
by: Zhao, Yihang, et al.
Published: (2024)

LLMClean: Context-Aware Tabular Data Cleaning via LLM-Generated OFDs
by: Biester, Fabian, et al.
Published: (2024)

Query Based Construction of Chronic Disease Datasets
by: Ngo, Vuong M., et al.
Published: (2024)

Croissant Baker: Metadata Generation for Discoverable, Governable, and Reusable ML Datasets
by: Attrach, Rafi Al, et al.
Published: (2026)

SchemaDB: Structures in Relational Datasets
by: Christopher, Cody James, et al.
Published: (2021)

DataLens: Enhancing Dataset Discovery via Network Topologies
by: Ollagnier, Anaïs, et al.
Published: (2025)

The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification
by: Tihanyi, Norbert, et al.
Published: (2023)

Context-Enriched Natural Language Descriptions of Vessel Trajectories
by: Patroumpas, Kostas, et al.
Published: (2026)

Conceptual Schema Inference for Tabular Datasets using Large Language Models
by: Wu, Zhenyu, et al.
Published: (2026)

An Intelligent Innovation Dataset on Scientific Research Outcomes
by: Wu, Xinran, et al.
Published: (2024)

Within-Dataset Disclosure Risk for Differential Privacy
by: Zhu, Zhiru, et al.
Published: (2023)

A Unified Approach for Multi-Granularity Search over Spatial Datasets
by: Yang, Wenzhe, et al.
Published: (2024)

Jelly-Patch: a Fast Format for Recording Changes in RDF Datasets
by: Sowinski, Piotr, et al.
Published: (2025)

From RDF Graph Validation to RDF Dataset Validation with SHACL-DS
by: Dao, Davan Chiem, et al.
Published: (2025)

jXBW: Fast Substructure Search for Large-Scale JSONL Datasets with LLM Applications
by: Tabei, Yasuo
Published: (2025)

PBE Meets LLM: When Few Examples Aren't Few-Shot Enough
by: Zhang, Shuning, et al.
Published: (2025)

When Less is More: The LLM Scaling Paradox in Context Compression
by: Guo, Ruishan, et al.
Published: (2026)

Global Dataset of Solar Power Plants: Multidimensional Integration and Analysis
by: Mantilla-Guerra, Anibal, et al.
Published: (2026)

Blocked Bloom Filters with Choices
by: Schmitz, Johanna Elena, et al.
Published: (2025)

Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets
by: Bohacek, Matyas, et al.
Published: (2025)

OSM+: Billion-Level OpenStreetMap Dataset for City-wide Experiments
by: Zheng, Guanjie, et al.
Published: (2025)