Saved in:
| Main Authors: | Walden, William, Ricci, Kathryn, Wanner, Miriam, Jiang, Zhengping, May, Chandler, Zhou, Rongkun, Van Durme, Benjamin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.12637 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?
by: Ou, Jiefu, et al.
Published: (2025)
by: Ou, Jiefu, et al.
Published: (2025)
A Closer Look at Claim Decomposition
by: Wanner, Miriam, et al.
Published: (2024)
by: Wanner, Miriam, et al.
Published: (2024)
Weird Generalization is Weirdly Brittle
by: Wanner, Miriam, et al.
Published: (2026)
by: Wanner, Miriam, et al.
Published: (2026)
Reasoning Models Will Sometimes Lie About Their Reasoning
by: Walden, William, et al.
Published: (2026)
by: Walden, William, et al.
Published: (2026)
DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation
by: Wanner, Miriam, et al.
Published: (2024)
by: Wanner, Miriam, et al.
Published: (2024)
MegaWika 2: A More Comprehensive Multilingual Collection of Articles and their Sources
by: Barham, Samuel, et al.
Published: (2025)
by: Barham, Samuel, et al.
Published: (2025)
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
by: Jiang, Zhengping, et al.
Published: (2025)
by: Jiang, Zhengping, et al.
Published: (2025)
Core: Robust Factual Precision with Informative Sub-Claim Identification
by: Jiang, Zhengping, et al.
Published: (2024)
by: Jiang, Zhengping, et al.
Published: (2024)
All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations
by: Wanner, Miriam, et al.
Published: (2025)
by: Wanner, Miriam, et al.
Published: (2025)
Rank1: Test-Time Compute for Reranking in Information Retrieval
by: Weller, Orion, et al.
Published: (2025)
by: Weller, Orion, et al.
Published: (2025)
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
by: Wang, Liaoyaqi, et al.
Published: (2025)
by: Wang, Liaoyaqi, et al.
Published: (2025)
RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation
by: Fleshman, William, et al.
Published: (2024)
by: Fleshman, William, et al.
Published: (2024)
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025)
by: Martin, Alexander, et al.
Published: (2025)
RORA: Robust Free-Text Rationale Evaluation
by: Jiang, Zhengping, et al.
Published: (2024)
by: Jiang, Zhengping, et al.
Published: (2024)
NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning
by: Weir, Nathaniel, et al.
Published: (2022)
by: Weir, Nathaniel, et al.
Published: (2022)
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
by: Sanders, Kate, et al.
Published: (2025)
by: Sanders, Kate, et al.
Published: (2025)
NevIR: Negation in Neural Information Retrieval
by: Weller, Orion, et al.
Published: (2023)
by: Weller, Orion, et al.
Published: (2023)
Seq vs Seq: An Open Suite of Paired Encoders and Decoders
by: Weller, Orion, et al.
Published: (2025)
by: Weller, Orion, et al.
Published: (2025)
SEQR: Secure and Efficient QR-based LoRA Routing
by: Fleshman, William, et al.
Published: (2025)
by: Fleshman, William, et al.
Published: (2025)
LoRA-Augmented Generation (LAG) for Knowledge-Intensive Language Tasks
by: Fleshman, William, et al.
Published: (2025)
by: Fleshman, William, et al.
Published: (2025)
RE-Adapt: Reverse Engineered Adaptation of Large Language Models
by: Fleshman, William, et al.
Published: (2024)
by: Fleshman, William, et al.
Published: (2024)
SpectR: Dynamically Composing LM Experts with Spectral Routing
by: Fleshman, William, et al.
Published: (2025)
by: Fleshman, William, et al.
Published: (2025)
Rank-K: Test-Time Reasoning for Listwise Reranking
by: Yang, Eugene, et al.
Published: (2025)
by: Yang, Eugene, et al.
Published: (2025)
Multi-Field Adaptive Retrieval
by: Li, Millicent, et al.
Published: (2024)
by: Li, Millicent, et al.
Published: (2024)
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering
by: Jurayj, William, et al.
Published: (2025)
by: Jurayj, William, et al.
Published: (2025)
Language Models and Logic Programs for Trustworthy Tax Reasoning
by: Jurayj, William, et al.
Published: (2025)
by: Jurayj, William, et al.
Published: (2025)
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
by: Cheng, Jeffrey, et al.
Published: (2024)
by: Cheng, Jeffrey, et al.
Published: (2024)
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
by: Ou, Jiefu, et al.
Published: (2024)
by: Ou, Jiefu, et al.
Published: (2024)
WikiVideo: Article Generation from Multiple Videos
by: Martin, Alexander, et al.
Published: (2025)
by: Martin, Alexander, et al.
Published: (2025)
Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores
by: Chari, Vivek, et al.
Published: (2025)
by: Chari, Vivek, et al.
Published: (2025)
LLMs Provide Unstable Answers to Legal Questions
by: Blair-Stanek, Andrew, et al.
Published: (2025)
by: Blair-Stanek, Andrew, et al.
Published: (2025)
Investigating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries
by: Liu, Gabrielle Kaili-May, et al.
Published: (2025)
by: Liu, Gabrielle Kaili-May, et al.
Published: (2025)
Does Local News Stay Local?: Online Content Shifts in Sinclair-Acquired Stations
by: Wanner, Miriam, et al.
Published: (2025)
by: Wanner, Miriam, et al.
Published: (2025)
SocialNLI: A Dialogue-Centric Social Inference Dataset
by: Deo, Akhil, et al.
Published: (2025)
by: Deo, Akhil, et al.
Published: (2025)
Learning to Retrieve Iteratively for In-Context Learning
by: Chen, Yunmo, et al.
Published: (2024)
by: Chen, Yunmo, et al.
Published: (2024)
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
by: Fleshman, William, et al.
Published: (2024)
by: Fleshman, William, et al.
Published: (2024)
LM Agents for Coordinating Multi-User Information Gathering
by: Jhamtani, Harsh, et al.
Published: (2025)
by: Jhamtani, Harsh, et al.
Published: (2025)
The NLP Task Effectiveness of Long-Range Transformers
by: Qin, Guanghui, et al.
Published: (2022)
by: Qin, Guanghui, et al.
Published: (2022)
KV-Distill: Nearly Lossless Learnable Context Compression for LLMs
by: Chari, Vivek, et al.
Published: (2025)
by: Chari, Vivek, et al.
Published: (2025)
Zero and Few-shot Semantic Parsing with Ambiguous Inputs
by: Stengel-Eskin, Elias, et al.
Published: (2023)
by: Stengel-Eskin, Elias, et al.
Published: (2023)
Similar Items
-
CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?
by: Ou, Jiefu, et al.
Published: (2025) -
A Closer Look at Claim Decomposition
by: Wanner, Miriam, et al.
Published: (2024) -
Weird Generalization is Weirdly Brittle
by: Wanner, Miriam, et al.
Published: (2026) -
Reasoning Models Will Sometimes Lie About Their Reasoning
by: Walden, William, et al.
Published: (2026) -
DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation
by: Wanner, Miriam, et al.
Published: (2024)