Saved in:
| Main Authors: | Huang, Yukun, Ribeiro, Leonardo F. R., Hardalov, Momchil, Dhingra, Bhuwan, Dreyer, Markus, Saligrama, Venkatesh |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.05912 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Real-time Factuality Assessment from Adversarial Feedback
by: Chen, Sanxing, et al.
Published: (2024)
by: Chen, Sanxing, et al.
Published: (2024)
Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff
by: Holsman, Maximilian, et al.
Published: (2025)
by: Holsman, Maximilian, et al.
Published: (2025)
To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts
by: Huang, Yukun, et al.
Published: (2024)
by: Huang, Yukun, et al.
Published: (2024)
Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents
by: Chandrahasan, Prahaladh, et al.
Published: (2025)
by: Chandrahasan, Prahaladh, et al.
Published: (2025)
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings
by: Thirukovalluru, Raghuveer, et al.
Published: (2024)
by: Thirukovalluru, Raghuveer, et al.
Published: (2024)
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
by: Huang, Yukun, et al.
Published: (2025)
by: Huang, Yukun, et al.
Published: (2025)
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
by: Chen, Sanxing, et al.
Published: (2025)
by: Chen, Sanxing, et al.
Published: (2025)
Calibrating Long-form Generations from Large Language Models
by: Huang, Yukun, et al.
Published: (2024)
by: Huang, Yukun, et al.
Published: (2024)
Coding Agents are Effective Long-Context Processors
by: Cao, Weili, et al.
Published: (2026)
by: Cao, Weili, et al.
Published: (2026)
Adversarial Math Word Problem Generation
by: Xie, Roy, et al.
Published: (2024)
by: Xie, Roy, et al.
Published: (2024)
Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
by: Ivanov, Petar, et al.
Published: (2023)
by: Ivanov, Petar, et al.
Published: (2023)
SCRAMBLe : Enhancing Multimodal LLM Compositionality with Synthetic Preference Data
by: Mishra, Samarth, et al.
Published: (2025)
by: Mishra, Samarth, et al.
Published: (2025)
Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2026)
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2026)
Generalizability of Large Language Model-Based Agents: A Comprehensive Survey
by: Zhang, Minxing, et al.
Published: (2025)
by: Zhang, Minxing, et al.
Published: (2025)
Hierarchical Multi-Label Classification of Online Vaccine Concerns
by: Zhu, Chloe Qinyu, et al.
Published: (2024)
by: Zhu, Chloe Qinyu, et al.
Published: (2024)
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
by: Cai, Hongyi James, et al.
Published: (2025)
by: Cai, Hongyi James, et al.
Published: (2025)
Linear Transformers Implicitly Discover Unified Numerical Algorithms
by: Lutz, Patrick, et al.
Published: (2025)
by: Lutz, Patrick, et al.
Published: (2025)
Staircase Streaming for Low-Latency Multi-Agent Inference
by: Wang, Junlin, et al.
Published: (2025)
by: Wang, Junlin, et al.
Published: (2025)
Document-as-Image Representations Fall Short for Scientific Retrieval
by: Khalighinejad, Ghazal, et al.
Published: (2026)
by: Khalighinejad, Ghazal, et al.
Published: (2026)
VeriTrace: Evolving Mental Models for Deep Research Agents
by: Zhao, Haolang, et al.
Published: (2026)
by: Zhao, Haolang, et al.
Published: (2026)
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
by: Fu, Deqing, et al.
Published: (2024)
by: Fu, Deqing, et al.
Published: (2024)
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
by: Wang, Junlin, et al.
Published: (2025)
by: Wang, Junlin, et al.
Published: (2025)
Atomic Self-Consistency for Better Long Form Generations
by: Thirukovalluru, Raghuveer, et al.
Published: (2024)
by: Thirukovalluru, Raghuveer, et al.
Published: (2024)
Symmetry Reveals Layerwise Dynamics: How Transformers Perform In-Context Classification
by: Lutz, Patrick, et al.
Published: (2026)
by: Lutz, Patrick, et al.
Published: (2026)
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)
by: Muhamed, Aashiq, et al.
Published: (2025)
DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
by: Wu, Tongzhou, et al.
Published: (2026)
by: Wu, Tongzhou, et al.
Published: (2026)
DeepShop: A Benchmark for Deep Research Shopping Agents
by: Lyu, Yougang, et al.
Published: (2025)
by: Lyu, Yougang, et al.
Published: (2025)
Deep Companion Learning: Enhancing Generalization Through Historical Consistency
by: Zhu, Ruizhao, et al.
Published: (2024)
by: Zhu, Ruizhao, et al.
Published: (2024)
Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
by: Mahaut, Matéo, et al.
Published: (2024)
by: Mahaut, Matéo, et al.
Published: (2024)
FactAlign: Long-form Factuality Alignment of Large Language Models
by: Huang, Chao-Wei, et al.
Published: (2024)
by: Huang, Chao-Wei, et al.
Published: (2024)
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)
by: Wang, Shengao, et al.
Published: (2025)
Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve)
by: Prateek, Saurav
Published: (2026)
by: Prateek, Saurav
Published: (2026)
SynCDR : Training Cross Domain Retrieval Models with Synthetic Data
by: Mishra, Samarth, et al.
Published: (2023)
by: Mishra, Samarth, et al.
Published: (2023)
InFact: Informativeness Alignment for Improved LLM Factuality
by: Cohen, Roi, et al.
Published: (2025)
by: Cohen, Roi, et al.
Published: (2025)
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
by: Ning, Yucheng, et al.
Published: (2025)
by: Ning, Yucheng, et al.
Published: (2025)
Over-Searching in Search-Augmented Large Language Models
by: Xie, Roy, et al.
Published: (2026)
by: Xie, Roy, et al.
Published: (2026)
EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents
by: Yuan, Zike, et al.
Published: (2026)
by: Yuan, Zike, et al.
Published: (2026)
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
by: Wan, Yuxuan, et al.
Published: (2026)
by: Wan, Yuxuan, et al.
Published: (2026)
Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research
by: Liu, Gang, et al.
Published: (2025)
by: Liu, Gang, et al.
Published: (2025)
LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models
by: Tran, Hieu, et al.
Published: (2024)
by: Tran, Hieu, et al.
Published: (2024)
Similar Items
-
Real-time Factuality Assessment from Adversarial Feedback
by: Chen, Sanxing, et al.
Published: (2024) -
Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff
by: Holsman, Maximilian, et al.
Published: (2025) -
To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts
by: Huang, Yukun, et al.
Published: (2024) -
Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents
by: Chandrahasan, Prahaladh, et al.
Published: (2025) -
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings
by: Thirukovalluru, Raghuveer, et al.
Published: (2024)