:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Garg, Ishir, Kolhe, Neel, Zhao, Xuandong, Song, Dawn
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.00575
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MemFail: Stress-Testing Failure Modes of LLM Memory Systems
by: Garg, Ishir, et al.
Published: (2026)

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
by: Xie, Jingxu, et al.
Published: (2025)

Fisher-Orthogonal Projected Natural Gradient Descent for Continual Learning
by: Garg, Ishir, et al.
Published: (2026)

Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs
by: Liu, Yepeng, et al.
Published: (2025)

The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
by: Xiong, Alexander, et al.
Published: (2025)

Scalable Best-of-N Selection for Large Language Models via Self-Certainty
by: Kang, Zhewei, et al.
Published: (2025)

Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
by: Cai, Will, et al.
Published: (2025)

In-Context Watermarks for Large Language Models
by: Liu, Yepeng, et al.
Published: (2025)

Learning to Reason without External Rewards
by: Zhao, Xuandong, et al.
Published: (2025)

Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
by: Liu, Yepeng, et al.
Published: (2025)

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
by: Tian, Yuchen, et al.
Published: (2024)

Improving LLM Safety Alignment with Dual-Objective Optimization
by: Zhao, Xuandong, et al.
Published: (2025)

Multimodal Situational Safety
by: Zhou, Kaiwen, et al.
Published: (2024)

Assessing Judging Bias in Large Reasoning Models: An Empirical Study
by: Wang, Qian, et al.
Published: (2025)

Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
by: Zhao, Xuandong, et al.
Published: (2024)

WideSearch: Benchmarking Agentic Broad Info-Seeking
by: Wong, Ryan, et al.
Published: (2025)

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
by: Liang, Kaiqu, et al.
Published: (2025)

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)

InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation
by: Xi, Yunjia, et al.
Published: (2025)

CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis
by: Zhang, Xinyu, et al.
Published: (2025)

Reliable Fine-Grained Evaluation of Natural Language Math Proofs
by: Ma, Wenjie, et al.
Published: (2025)

A Practical Examination of AI-Generated Text Detectors for Large Language Models
by: Tufts, Brian, et al.
Published: (2024)

MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks
by: Marius, Dumitran Adrian, et al.
Published: (2025)

Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters
by: Potter, Yujin, et al.
Published: (2024)

SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators
by: Moskovskiy, Daniil, et al.
Published: (2025)

InfoAgent: Advancing Autonomous Information-Seeking Agents
by: Zhang, Gongrui, et al.
Published: (2025)

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
by: Wu, Zijian, et al.
Published: (2025)

InfoTech Assistant: A Multimodal Conversational Agent for InfoTechnology Web Portal Queries
by: Gadiraju, Sai Surya, et al.
Published: (2024)

InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
by: Trienes, Jan, et al.
Published: (2024)

Efficiently Identifying Watermarked Segments in Mixed-Source Texts
by: Zhao, Xuandong, et al.
Published: (2024)

Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper
by: Garg, Krishna, et al.
Published: (2025)

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
by: Wei, Chengwei, et al.
Published: (2026)

InfoFlood: Jailbreaking Large Language Models with Information Overload
by: Yadav, Advait, et al.
Published: (2025)

Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature
by: Zhou, Tong, et al.
Published: (2024)

Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models
by: Patel, Laksh, et al.
Published: (2025)

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning
by: Taranukhin, Maksym, et al.
Published: (2026)

PennySynth: RAG-Driven Data Synthesis for Automated Quantum Code Generation
by: Shao, Minghao, et al.
Published: (2026)

Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
by: Chughtai, Bilal, et al.
Published: (2024)

DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing
by: Zhang, Hongzhi, et al.
Published: (2026)

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
by: Sun, Yiyou, et al.
Published: (2025)