Saved in:
| Main Authors: | Mukherjee, Sumit, Shu, Juan, Mazumder, Nairwita, Kernell, Tate, Wheeler, Celena, Hastings, Shannon, Sidey-Gibbons, Chris |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.14616 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications
by: Kim, Sean, et al.
Published: (2024)
by: Kim, Sean, et al.
Published: (2024)
Identifying noise transients in gravitational-wave data arising from nonlinear couplings
by: Hall, Bernard, et al.
Published: (2024)
by: Hall, Bernard, et al.
Published: (2024)
LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval
by: Ahmadi, Narges Baba, et al.
Published: (2026)
by: Ahmadi, Narges Baba, et al.
Published: (2026)
ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering
by: Wu, Ruofan, et al.
Published: (2025)
by: Wu, Ruofan, et al.
Published: (2025)
Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge
by: Bacellar, Andre
Published: (2026)
by: Bacellar, Andre
Published: (2026)
Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
by: Wotherspoon, Shannon, et al.
Published: (2024)
by: Wotherspoon, Shannon, et al.
Published: (2024)
CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions
by: Jourdan, Leane, et al.
Published: (2024)
by: Jourdan, Leane, et al.
Published: (2024)
Multilingual TinyStories: A Synthetic Combinatorial Corpus of Indic Children's Stories for Training Small Language Models
by: Halder, Deepon, et al.
Published: (2026)
by: Halder, Deepon, et al.
Published: (2026)
GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians
by: Chen, Xiuyuan, et al.
Published: (2025)
by: Chen, Xiuyuan, et al.
Published: (2025)
Automated Adversarial Discovery for Safety Classifiers
by: Lal, Yash Kumar, et al.
Published: (2024)
by: Lal, Yash Kumar, et al.
Published: (2024)
Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis
by: Klemen, Matej, et al.
Published: (2025)
by: Klemen, Matej, et al.
Published: (2025)
ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs
by: Wang, Zhipin, et al.
Published: (2026)
by: Wang, Zhipin, et al.
Published: (2026)
Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology
by: Gupta, Shashi Kant, et al.
Published: (2024)
by: Gupta, Shashi Kant, et al.
Published: (2024)
Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine
by: Liu, Fenglin, et al.
Published: (2022)
by: Liu, Fenglin, et al.
Published: (2022)
Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments
by: Kim, Yunsung, et al.
Published: (2025)
by: Kim, Yunsung, et al.
Published: (2025)
Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings
by: Hamna, Hamna, et al.
Published: (2025)
by: Hamna, Hamna, et al.
Published: (2025)
Hierarchical Indexing with Knowledge Enrichment for Multilingual Video Corpus Retrieval
by: Wang, Yu, et al.
Published: (2025)
by: Wang, Yu, et al.
Published: (2025)
A Topic-aware Comparable Corpus of Chinese Variations
by: Lian, Da-Chen, et al.
Published: (2024)
by: Lian, Da-Chen, et al.
Published: (2024)
MaiBERT: A Pre-training Corpus and Language Model for Low-Resourced Maithili Language
by: Yadav, Sumit, et al.
Published: (2025)
by: Yadav, Sumit, et al.
Published: (2025)
From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines
by: Lee, Sunkyung, et al.
Published: (2026)
by: Lee, Sunkyung, et al.
Published: (2026)
Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval
by: Li, Yongkang, et al.
Published: (2025)
by: Li, Yongkang, et al.
Published: (2025)
Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval
by: Li, Yongkang, et al.
Published: (2025)
by: Li, Yongkang, et al.
Published: (2025)
C-VARC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models
by: Wu, Ping, et al.
Published: (2025)
by: Wu, Ping, et al.
Published: (2025)
Low-resource Information Extraction with the European Clinical Case Corpus
by: Ghosh, Soumitra, et al.
Published: (2025)
by: Ghosh, Soumitra, et al.
Published: (2025)
Determinants of Training Corpus Size for Clinical Text Classification
by: Chaturvedi, Jaya, et al.
Published: (2026)
by: Chaturvedi, Jaya, et al.
Published: (2026)
Challenges in Explaining Pretrained Clinical Text Classifiers
by: Miok, Kristian, et al.
Published: (2026)
by: Miok, Kristian, et al.
Published: (2026)
MONOVAB : An Annotated Corpus for Bangla Multi-label Emotion Detection
by: Banshal, Sumit Kumar, et al.
Published: (2023)
by: Banshal, Sumit Kumar, et al.
Published: (2023)
Towards Automated Verification of LLM-Synthesized C Programs
by: Mukherjee, Prasita, et al.
Published: (2024)
by: Mukherjee, Prasita, et al.
Published: (2024)
RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
by: Niu, Cheng, et al.
Published: (2023)
by: Niu, Cheng, et al.
Published: (2023)
Could the Road to Grounded, Neuro-symbolic AI be Paved with Words-as-Classifiers?
by: Kennington, Casey, et al.
Published: (2025)
by: Kennington, Casey, et al.
Published: (2025)
MDC-R: The Minecraft Dialogue Corpus with Reference
by: Madge, Chris, et al.
Published: (2025)
by: Madge, Chris, et al.
Published: (2025)
REInstruct: Building Instruction Data from Unlabeled Corpus
by: Chen, Shu, et al.
Published: (2024)
by: Chen, Shu, et al.
Published: (2024)
SEC-QA: A Systematic Evaluation Corpus for Financial QA
by: Lai, Viet Dac, et al.
Published: (2024)
by: Lai, Viet Dac, et al.
Published: (2024)
Graph-Guided Passage Retrieval for Author-Centric Structured Feedback
by: Chitale, Maitreya Prafulla, et al.
Published: (2025)
by: Chitale, Maitreya Prafulla, et al.
Published: (2025)
An Ethically Grounded LLM-Based Approach to Insider Threat Synthesis and Detection
by: Gelman, Haywood, et al.
Published: (2025)
by: Gelman, Haywood, et al.
Published: (2025)
The SAMER Arabic Text Simplification Corpus
by: Alhafni, Bashar, et al.
Published: (2024)
by: Alhafni, Bashar, et al.
Published: (2024)
AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts
by: Braun, Daniel, et al.
Published: (2024)
by: Braun, Daniel, et al.
Published: (2024)
AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding
by: Oh, Gyutaek, et al.
Published: (2025)
by: Oh, Gyutaek, et al.
Published: (2025)
Common Ground, Diverse Roots: The Difficulty of Classifying Common Examples in Spanish Varieties
by: Lopetegui, Javier A., et al.
Published: (2024)
by: Lopetegui, Javier A., et al.
Published: (2024)
Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level Reasoning
by: Luo, Qi, et al.
Published: (2025)
by: Luo, Qi, et al.
Published: (2025)
Similar Items
-
Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications
by: Kim, Sean, et al.
Published: (2024) -
Identifying noise transients in gravitational-wave data arising from nonlinear couplings
by: Hall, Bernard, et al.
Published: (2024) -
LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval
by: Ahmadi, Narges Baba, et al.
Published: (2026) -
ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering
by: Wu, Ruofan, et al.
Published: (2025) -
Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge
by: Bacellar, Andre
Published: (2026)