:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Yukun, Ribeiro, Leonardo F. R., Hardalov, Momchil, Dhingra, Bhuwan, Dreyer, Markus, Saligrama, Venkatesh
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.05912
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Real-time Factuality Assessment from Adversarial Feedback
by: Chen, Sanxing, et al.
Published: (2024)

Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff
by: Holsman, Maximilian, et al.
Published: (2025)

To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts
by: Huang, Yukun, et al.
Published: (2024)

Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents
by: Chandrahasan, Prahaladh, et al.
Published: (2025)

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings
by: Thirukovalluru, Raghuveer, et al.
Published: (2024)

Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
by: Huang, Yukun, et al.
Published: (2025)

When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
by: Chen, Sanxing, et al.
Published: (2025)

Calibrating Long-form Generations from Large Language Models
by: Huang, Yukun, et al.
Published: (2024)

Coding Agents are Effective Long-Context Processors
by: Cao, Weili, et al.
Published: (2026)

Adversarial Math Word Problem Generation
by: Xie, Roy, et al.
Published: (2024)

Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
by: Ivanov, Petar, et al.
Published: (2023)

SCRAMBLe : Enhancing Multimodal LLM Compositionality with Synthetic Preference Data
by: Mishra, Samarth, et al.
Published: (2025)

Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2026)

Generalizability of Large Language Model-Based Agents: A Comprehensive Survey
by: Zhang, Minxing, et al.
Published: (2025)

Hierarchical Multi-Label Classification of Online Vaccine Concerns
by: Zhu, Chloe Qinyu, et al.
Published: (2024)

How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
by: Cai, Hongyi James, et al.
Published: (2025)

Linear Transformers Implicitly Discover Unified Numerical Algorithms
by: Lutz, Patrick, et al.
Published: (2025)

Staircase Streaming for Low-Latency Multi-Agent Inference
by: Wang, Junlin, et al.
Published: (2025)

Document-as-Image Representations Fall Short for Scientific Retrieval
by: Khalighinejad, Ghazal, et al.
Published: (2026)

VeriTrace: Evolving Mental Models for Deep Research Agents
by: Zhao, Haolang, et al.
Published: (2026)

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
by: Fu, Deqing, et al.
Published: (2024)

Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
by: Wang, Junlin, et al.
Published: (2025)

Atomic Self-Consistency for Better Long Form Generations
by: Thirukovalluru, Raghuveer, et al.
Published: (2024)

Symmetry Reveals Layerwise Dynamics: How Transformers Perform In-Context Classification
by: Lutz, Patrick, et al.
Published: (2026)

RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
by: Wu, Tongzhou, et al.
Published: (2026)

DeepShop: A Benchmark for Deep Research Shopping Agents
by: Lyu, Yougang, et al.
Published: (2025)

Deep Companion Learning: Enhancing Generalization Through Historical Consistency
by: Zhu, Ruizhao, et al.
Published: (2024)

Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
by: Mahaut, Matéo, et al.
Published: (2024)

FactAlign: Long-form Factuality Alignment of Large Language Models
by: Huang, Chao-Wei, et al.
Published: (2024)

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)

Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve)
by: Prateek, Saurav
Published: (2026)

SynCDR : Training Cross Domain Retrieval Models with Synthetic Data
by: Mishra, Samarth, et al.
Published: (2023)

InFact: Informativeness Alignment for Improved LLM Factuality
by: Cohen, Roi, et al.
Published: (2025)

MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
by: Ning, Yucheng, et al.
Published: (2025)

Over-Searching in Search-Augmented Large Language Models
by: Xie, Roy, et al.
Published: (2026)

EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents
by: Yuan, Zike, et al.
Published: (2026)

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
by: Wan, Yuxuan, et al.
Published: (2026)

Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research
by: Liu, Gang, et al.
Published: (2025)

LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models
by: Tran, Hieu, et al.
Published: (2024)