Saved in:
| Main Authors: | Garg, Ishir, Kolhe, Neel, Zhao, Xuandong, Song, Dawn |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.00575 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MemFail: Stress-Testing Failure Modes of LLM Memory Systems
by: Garg, Ishir, et al.
Published: (2026)
by: Garg, Ishir, et al.
Published: (2026)
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
by: Xie, Jingxu, et al.
Published: (2025)
by: Xie, Jingxu, et al.
Published: (2025)
Fisher-Orthogonal Projected Natural Gradient Descent for Continual Learning
by: Garg, Ishir, et al.
Published: (2026)
by: Garg, Ishir, et al.
Published: (2026)
Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs
by: Liu, Yepeng, et al.
Published: (2025)
by: Liu, Yepeng, et al.
Published: (2025)
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
by: Xiong, Alexander, et al.
Published: (2025)
by: Xiong, Alexander, et al.
Published: (2025)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
by: Kang, Zhewei, et al.
Published: (2025)
by: Kang, Zhewei, et al.
Published: (2025)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
by: Cai, Will, et al.
Published: (2025)
by: Cai, Will, et al.
Published: (2025)
In-Context Watermarks for Large Language Models
by: Liu, Yepeng, et al.
Published: (2025)
by: Liu, Yepeng, et al.
Published: (2025)
Learning to Reason without External Rewards
by: Zhao, Xuandong, et al.
Published: (2025)
by: Zhao, Xuandong, et al.
Published: (2025)
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
by: Liu, Yepeng, et al.
Published: (2025)
by: Liu, Yepeng, et al.
Published: (2025)
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
by: Tian, Yuchen, et al.
Published: (2024)
by: Tian, Yuchen, et al.
Published: (2024)
Improving LLM Safety Alignment with Dual-Objective Optimization
by: Zhao, Xuandong, et al.
Published: (2025)
by: Zhao, Xuandong, et al.
Published: (2025)
Multimodal Situational Safety
by: Zhou, Kaiwen, et al.
Published: (2024)
by: Zhou, Kaiwen, et al.
Published: (2024)
Assessing Judging Bias in Large Reasoning Models: An Empirical Study
by: Wang, Qian, et al.
Published: (2025)
by: Wang, Qian, et al.
Published: (2025)
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
by: Zhao, Xuandong, et al.
Published: (2024)
by: Zhao, Xuandong, et al.
Published: (2024)
WideSearch: Benchmarking Agentic Broad Info-Seeking
by: Wong, Ryan, et al.
Published: (2025)
by: Wong, Ryan, et al.
Published: (2025)
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
by: Liang, Kaiqu, et al.
Published: (2025)
by: Liang, Kaiqu, et al.
Published: (2025)
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)
by: Zhou, Kaiwen, et al.
Published: (2025)
InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation
by: Xi, Yunjia, et al.
Published: (2025)
by: Xi, Yunjia, et al.
Published: (2025)
CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
Reliable Fine-Grained Evaluation of Natural Language Math Proofs
by: Ma, Wenjie, et al.
Published: (2025)
by: Ma, Wenjie, et al.
Published: (2025)
A Practical Examination of AI-Generated Text Detectors for Large Language Models
by: Tufts, Brian, et al.
Published: (2024)
by: Tufts, Brian, et al.
Published: (2024)
MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks
by: Marius, Dumitran Adrian, et al.
Published: (2025)
by: Marius, Dumitran Adrian, et al.
Published: (2025)
Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters
by: Potter, Yujin, et al.
Published: (2024)
by: Potter, Yujin, et al.
Published: (2024)
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators
by: Moskovskiy, Daniil, et al.
Published: (2025)
by: Moskovskiy, Daniil, et al.
Published: (2025)
InfoAgent: Advancing Autonomous Information-Seeking Agents
by: Zhang, Gongrui, et al.
Published: (2025)
by: Zhang, Gongrui, et al.
Published: (2025)
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
by: Wu, Zijian, et al.
Published: (2025)
by: Wu, Zijian, et al.
Published: (2025)
InfoTech Assistant: A Multimodal Conversational Agent for InfoTechnology Web Portal Queries
by: Gadiraju, Sai Surya, et al.
Published: (2024)
by: Gadiraju, Sai Surya, et al.
Published: (2024)
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
by: Trienes, Jan, et al.
Published: (2024)
by: Trienes, Jan, et al.
Published: (2024)
Efficiently Identifying Watermarked Segments in Mixed-Source Texts
by: Zhao, Xuandong, et al.
Published: (2024)
by: Zhao, Xuandong, et al.
Published: (2024)
Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper
by: Garg, Krishna, et al.
Published: (2025)
by: Garg, Krishna, et al.
Published: (2025)
InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
by: Wei, Chengwei, et al.
Published: (2026)
by: Wei, Chengwei, et al.
Published: (2026)
InfoFlood: Jailbreaking Large Language Models with Information Overload
by: Yadav, Advait, et al.
Published: (2025)
by: Yadav, Advait, et al.
Published: (2025)
Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature
by: Zhou, Tong, et al.
Published: (2024)
by: Zhou, Tong, et al.
Published: (2024)
Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models
by: Patel, Laksh, et al.
Published: (2025)
by: Patel, Laksh, et al.
Published: (2025)
InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning
by: Taranukhin, Maksym, et al.
Published: (2026)
by: Taranukhin, Maksym, et al.
Published: (2026)
PennySynth: RAG-Driven Data Synthesis for Automated Quantum Code Generation
by: Shao, Minghao, et al.
Published: (2026)
by: Shao, Minghao, et al.
Published: (2026)
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
by: Chughtai, Bilal, et al.
Published: (2024)
by: Chughtai, Bilal, et al.
Published: (2024)
DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing
by: Zhang, Hongzhi, et al.
Published: (2026)
by: Zhang, Hongzhi, et al.
Published: (2026)
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
by: Sun, Yiyou, et al.
Published: (2025)
by: Sun, Yiyou, et al.
Published: (2025)
Similar Items
-
MemFail: Stress-Testing Failure Modes of LLM Memory Systems
by: Garg, Ishir, et al.
Published: (2026) -
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
by: Xie, Jingxu, et al.
Published: (2025) -
Fisher-Orthogonal Projected Natural Gradient Descent for Continual Learning
by: Garg, Ishir, et al.
Published: (2026) -
Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs
by: Liu, Yepeng, et al.
Published: (2025) -
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
by: Xiong, Alexander, et al.
Published: (2025)