:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mukherjee, Sumit, Shu, Juan, Mazumder, Nairwita, Kernell, Tate, Wheeler, Celena, Hastings, Shannon, Sidey-Gibbons, Chris
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2604.14616
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications
by: Kim, Sean, et al.
Published: (2024)

Identifying noise transients in gravitational-wave data arising from nonlinear couplings
by: Hall, Bernard, et al.
Published: (2024)

LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval
by: Ahmadi, Narges Baba, et al.
Published: (2026)

ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering
by: Wu, Ruofan, et al.
Published: (2025)

Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge
by: Bacellar, Andre
Published: (2026)

Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
by: Wotherspoon, Shannon, et al.
Published: (2024)

CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions
by: Jourdan, Leane, et al.
Published: (2024)

Multilingual TinyStories: A Synthetic Combinatorial Corpus of Indic Children's Stories for Training Small Language Models
by: Halder, Deepon, et al.
Published: (2026)

GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians
by: Chen, Xiuyuan, et al.
Published: (2025)

Automated Adversarial Discovery for Safety Classifiers
by: Lal, Yash Kumar, et al.
Published: (2024)

Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis
by: Klemen, Matej, et al.
Published: (2025)

ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs
by: Wang, Zhipin, et al.
Published: (2026)

Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology
by: Gupta, Shashi Kant, et al.
Published: (2024)

Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine
by: Liu, Fenglin, et al.
Published: (2022)

Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments
by: Kim, Yunsung, et al.
Published: (2025)

Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings
by: Hamna, Hamna, et al.
Published: (2025)

Hierarchical Indexing with Knowledge Enrichment for Multilingual Video Corpus Retrieval
by: Wang, Yu, et al.
Published: (2025)

A Topic-aware Comparable Corpus of Chinese Variations
by: Lian, Da-Chen, et al.
Published: (2024)

MaiBERT: A Pre-training Corpus and Language Model for Low-Resourced Maithili Language
by: Yadav, Sumit, et al.
Published: (2025)

From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines
by: Lee, Sunkyung, et al.
Published: (2026)

Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval
by: Li, Yongkang, et al.
Published: (2025)

Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval
by: Li, Yongkang, et al.
Published: (2025)

C-VARC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models
by: Wu, Ping, et al.
Published: (2025)

Low-resource Information Extraction with the European Clinical Case Corpus
by: Ghosh, Soumitra, et al.
Published: (2025)

Determinants of Training Corpus Size for Clinical Text Classification
by: Chaturvedi, Jaya, et al.
Published: (2026)

Challenges in Explaining Pretrained Clinical Text Classifiers
by: Miok, Kristian, et al.
Published: (2026)

MONOVAB : An Annotated Corpus for Bangla Multi-label Emotion Detection
by: Banshal, Sumit Kumar, et al.
Published: (2023)

Towards Automated Verification of LLM-Synthesized C Programs
by: Mukherjee, Prasita, et al.
Published: (2024)

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
by: Niu, Cheng, et al.
Published: (2023)

Could the Road to Grounded, Neuro-symbolic AI be Paved with Words-as-Classifiers?
by: Kennington, Casey, et al.
Published: (2025)

MDC-R: The Minecraft Dialogue Corpus with Reference
by: Madge, Chris, et al.
Published: (2025)

REInstruct: Building Instruction Data from Unlabeled Corpus
by: Chen, Shu, et al.
Published: (2024)

SEC-QA: A Systematic Evaluation Corpus for Financial QA
by: Lai, Viet Dac, et al.
Published: (2024)

Graph-Guided Passage Retrieval for Author-Centric Structured Feedback
by: Chitale, Maitreya Prafulla, et al.
Published: (2025)

An Ethically Grounded LLM-Based Approach to Insider Threat Synthesis and Detection
by: Gelman, Haywood, et al.
Published: (2025)

The SAMER Arabic Text Simplification Corpus
by: Alhafni, Bashar, et al.
Published: (2024)

AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts
by: Braun, Daniel, et al.
Published: (2024)

AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding
by: Oh, Gyutaek, et al.
Published: (2025)

Common Ground, Diverse Roots: The Difficulty of Classifying Common Examples in Spanish Varieties
by: Lopetegui, Javier A., et al.
Published: (2024)

Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level Reasoning
by: Luo, Qi, et al.
Published: (2025)