Saved in:
| Main Authors: | Oh, Nathaniel, Attie, Paul |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.26829 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
First Hallucination Tokens Are Different from Conditional Ones
by: Snel, Jakob, et al.
Published: (2025)
by: Snel, Jakob, et al.
Published: (2025)
Predicting LLM Correctness in Prosthodontics Using Metadata and Hallucination Signals
by: Susanto, Lucky, et al.
Published: (2025)
by: Susanto, Lucky, et al.
Published: (2025)
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
by: Foerster, Hanna, et al.
Published: (2025)
by: Foerster, Hanna, et al.
Published: (2025)
Sea-cret Agents: Maritime Abduction for Region Generation to Expose Dark Vessel Trajectories
by: Bavikadi, Divyagna, et al.
Published: (2025)
by: Bavikadi, Divyagna, et al.
Published: (2025)
Making AI-Assisted Grant Evaluation Auditable without Exposing the Model
by: Bicakci, Kemal
Published: (2026)
by: Bicakci, Kemal
Published: (2026)
One Filter to Deploy Them All: Robust Safety for Quadrupedal Navigation in Unknown Environments
by: Lin, Albert, et al.
Published: (2024)
by: Lin, Albert, et al.
Published: (2024)
Fantastic Pretraining Optimizers and Where to Find Them
by: Wen, Kaiyue, et al.
Published: (2025)
by: Wen, Kaiyue, et al.
Published: (2025)
Low Rank Gradients and Where to Find Them
by: Sonthalia, Rishi, et al.
Published: (2025)
by: Sonthalia, Rishi, et al.
Published: (2025)
Computational Safety for Generative AI: A Signal Processing Perspective
by: Chen, Pin-Yu
Published: (2025)
by: Chen, Pin-Yu
Published: (2025)
GNN Explanations that do not Explain and How to find Them
by: Azzolin, Steve, et al.
Published: (2026)
by: Azzolin, Steve, et al.
Published: (2026)
What Cohort INRs Encode and Where to Freeze Them
by: Sideri-Lampretsa, Vasiliki, et al.
Published: (2026)
by: Sideri-Lampretsa, Vasiliki, et al.
Published: (2026)
Hidden Error Awareness in Chain-of-Thought Reasoning: The Signal Is Diagnostic, Not Causal
by: Yuan, Aojie, et al.
Published: (2026)
by: Yuan, Aojie, et al.
Published: (2026)
Explaining Predictive Uncertainty by Exposing Second-Order Effects
by: Bley, Florian, et al.
Published: (2024)
by: Bley, Florian, et al.
Published: (2024)
Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis
by: Zhang, Haoyu, et al.
Published: (2026)
by: Zhang, Haoyu, et al.
Published: (2026)
How to Square Tensor Networks and Circuits Without Squaring Them
by: Loconte, Lorenzo, et al.
Published: (2025)
by: Loconte, Lorenzo, et al.
Published: (2025)
Transcendence: Generative Models Can Outperform The Experts That Train Them
by: Zhang, Edwin, et al.
Published: (2024)
by: Zhang, Edwin, et al.
Published: (2024)
Calibrated Language Models and How to Find Them with Label Smoothing
by: Huang, Jerry, et al.
Published: (2025)
by: Huang, Jerry, et al.
Published: (2025)
REBEL: Hidden Knowledge Recovery via Evolutionary-Based Evaluation Loop
by: Rybak, Patryk, et al.
Published: (2026)
by: Rybak, Patryk, et al.
Published: (2026)
When Privacy Isn't Synthetic: Hidden Data Leakage in Generative AI Models
by: Mustaqim, S. M., et al.
Published: (2025)
by: Mustaqim, S. M., et al.
Published: (2025)
Sparsest Models Elude Pruning: An Exposé of Pruning's Current Capabilities
by: Zhang, Stephen, et al.
Published: (2024)
by: Zhang, Stephen, et al.
Published: (2024)
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
by: Huang, Tiansheng, et al.
Published: (2025)
by: Huang, Tiansheng, et al.
Published: (2025)
Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation
by: Ghaznavi, Mahdi, et al.
Published: (2024)
by: Ghaznavi, Mahdi, et al.
Published: (2024)
Safety Alignment Can Be Not Superficial With Explicit Safety Signals
by: Li, Jianwei, et al.
Published: (2025)
by: Li, Jianwei, et al.
Published: (2025)
Translating Subgraphs to Nodes Makes Simple GNNs Strong and Efficient for Subgraph Representation Learning
by: Kim, Dongkwan, et al.
Published: (2022)
by: Kim, Dongkwan, et al.
Published: (2022)
When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making
by: Riasat, Nazia
Published: (2026)
by: Riasat, Nazia
Published: (2026)
One Wave To Explain Them All: A Unifying Perspective On Feature Attribution
by: Kasmi, Gabriel, et al.
Published: (2024)
by: Kasmi, Gabriel, et al.
Published: (2024)
Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)
by: Prinster, Drew, et al.
Published: (2024)
by: Prinster, Drew, et al.
Published: (2024)
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them
by: Jin, Jiahe, et al.
Published: (2025)
by: Jin, Jiahe, et al.
Published: (2025)
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
by: Huang, Audrey, et al.
Published: (2025)
by: Huang, Audrey, et al.
Published: (2025)
CHILL at SemEval-2025 Task 2: You Can't Just Throw Entities and Hope -- Make Your LLM to Get Them Right
by: Lee, Jaebok, et al.
Published: (2025)
by: Lee, Jaebok, et al.
Published: (2025)
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
by: Oh, Jio, et al.
Published: (2024)
by: Oh, Jio, et al.
Published: (2024)
Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification
by: Faye, Géraud, et al.
Published: (2024)
by: Faye, Géraud, et al.
Published: (2024)
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
by: Li, Zhuowei, et al.
Published: (2025)
by: Li, Zhuowei, et al.
Published: (2025)
What LLMs Think When You Don't Tell Them What to Think About?
by: Kwon, Yongchan, et al.
Published: (2026)
by: Kwon, Yongchan, et al.
Published: (2026)
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective
by: Xie, Zeke, et al.
Published: (2020)
by: Xie, Zeke, et al.
Published: (2020)
Toward a Metrology for Artificial Intelligence: Hidden-Rule Environments and Reinforcement Learning
by: Mathew, Christo, et al.
Published: (2025)
by: Mathew, Christo, et al.
Published: (2025)
Learning with Hidden Factorial Structure
by: Arnal, Charles, et al.
Published: (2024)
by: Arnal, Charles, et al.
Published: (2024)
The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation
by: Lan, Yifan, et al.
Published: (2026)
by: Lan, Yifan, et al.
Published: (2026)
StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer
by: Zheng, Guantian
Published: (2026)
by: Zheng, Guantian
Published: (2026)
The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
by: Zhou, Yefan, et al.
Published: (2026)
by: Zhou, Yefan, et al.
Published: (2026)
Similar Items
-
First Hallucination Tokens Are Different from Conditional Ones
by: Snel, Jakob, et al.
Published: (2025) -
Predicting LLM Correctness in Prosthodontics Using Metadata and Hallucination Signals
by: Susanto, Lucky, et al.
Published: (2025) -
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
by: Foerster, Hanna, et al.
Published: (2025) -
Sea-cret Agents: Maritime Abduction for Region Generation to Expose Dark Vessel Trajectories
by: Bavikadi, Divyagna, et al.
Published: (2025) -
Making AI-Assisted Grant Evaluation Auditable without Exposing the Model
by: Bicakci, Kemal
Published: (2026)