Saved in:
| Main Authors: | Sun, Xiangkun, Kong, Lingkai, Zhang, Aoqi, Zeng, Liang, Wang, Tonghan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.09314 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs
by: Saji, Alan, et al.
Published: (2025)
by: Saji, Alan, et al.
Published: (2025)
Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
by: Peters, Sydney, et al.
Published: (2025)
by: Peters, Sydney, et al.
Published: (2025)
Understanding LLM Evaluator Behavior: A Structured Multi-Evaluator Framework for Merchant Risk Assessment
by: Wang, Liang, et al.
Published: (2026)
by: Wang, Liang, et al.
Published: (2026)
Evaluating Relational Reasoning in LLMs with REL
by: Fesser, Lukas, et al.
Published: (2026)
by: Fesser, Lukas, et al.
Published: (2026)
Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs
by: Xu, Chenjun, et al.
Published: (2025)
by: Xu, Chenjun, et al.
Published: (2025)
Empowering Tabular Data Preparation with Language Models: Why and How?
by: Chen, Mengshi, et al.
Published: (2025)
by: Chen, Mengshi, et al.
Published: (2025)
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models
by: Fang, Bowen, et al.
Published: (2026)
by: Fang, Bowen, et al.
Published: (2026)
Good to Go: The LOOP Skill Engine That Hits 99% Success and Slashes Token Usage by 99% via One-Shot Recording and Deterministic Replay
by: Wang, Xiaohua, et al.
Published: (2026)
by: Wang, Xiaohua, et al.
Published: (2026)
Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints
by: Peng, Songping, et al.
Published: (2026)
by: Peng, Songping, et al.
Published: (2026)
LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance
by: Ivanov, Igor
Published: (2025)
by: Ivanov, Igor
Published: (2025)
Universal Adversarial Attack on Aligned Multimodal LLMs
by: Rahmatullaev, Temurbek, et al.
Published: (2025)
by: Rahmatullaev, Temurbek, et al.
Published: (2025)
PersistBench: When Should Long-Term Memories Be Forgotten by LLMs?
by: Pulipaka, Sidharth, et al.
Published: (2026)
by: Pulipaka, Sidharth, et al.
Published: (2026)
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective
by: Xu, Weijie, et al.
Published: (2025)
by: Xu, Weijie, et al.
Published: (2025)
Old Habits Die Hard: How Conversational History Geometrically Traps LLMs
by: Simhi, Adi, et al.
Published: (2026)
by: Simhi, Adi, et al.
Published: (2026)
From Fake Focus to Real Precision: Confusion-Driven Adversarial Attention Learning in Transformers
by: Liu, Yawei
Published: (2025)
by: Liu, Yawei
Published: (2025)
Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report
by: Chen, Yan, et al.
Published: (2025)
by: Chen, Yan, et al.
Published: (2025)
Pareto-Optimized Open-Source LLMs for Healthcare via Context Retrieval
by: Bayarri-Planas, Jordi, et al.
Published: (2024)
by: Bayarri-Planas, Jordi, et al.
Published: (2024)
An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs
by: Rai, Daking, et al.
Published: (2024)
by: Rai, Daking, et al.
Published: (2024)
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
by: Davide, Fabrizio, et al.
Published: (2024)
by: Davide, Fabrizio, et al.
Published: (2024)
How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment
by: Ansell, Rebecca, et al.
Published: (2026)
by: Ansell, Rebecca, et al.
Published: (2026)
Multi-Paradigm Agent Interaction in Practice:A Systematic Analysis of Generator-Evaluator, ReAct Loop,and Adversarial Evaluation in the buddyMe Framework
by: Wang, Xiaohua, et al.
Published: (2026)
by: Wang, Xiaohua, et al.
Published: (2026)
A Knowledge Enhanced Learning and Semantic Composition Model for Multi-Claim Fact Checking
by: Wang, Shuai, et al.
Published: (2021)
by: Wang, Shuai, et al.
Published: (2021)
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
by: Iqbal, Hasan, et al.
Published: (2024)
by: Iqbal, Hasan, et al.
Published: (2024)
Beyond Prefixes: Graph-as-Memory Cross-Attention for Knowledge Graph Completion with Large Language Models
by: Liu, Ruitong, et al.
Published: (2025)
by: Liu, Ruitong, et al.
Published: (2025)
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation
by: Lai, Junyu, et al.
Published: (2025)
by: Lai, Junyu, et al.
Published: (2025)
Attention Drift: What Autoregressive Speculative Decoding Models Learn
by: Eldenk, Doğaç, et al.
Published: (2026)
by: Eldenk, Doğaç, et al.
Published: (2026)
How Does Unfaithful Reasoning Emerge from Autoregressive Training? A Study of Synthetic Experiments
by: Wang, Fuxin, et al.
Published: (2026)
by: Wang, Fuxin, et al.
Published: (2026)
SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions
by: Pan, Huitong, et al.
Published: (2024)
by: Pan, Huitong, et al.
Published: (2024)
PuzzleClone: A DSL-Powered Framework for Synthesizing Verifiable Data
by: Xiong, Kai, et al.
Published: (2025)
by: Xiong, Kai, et al.
Published: (2025)
Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models
by: Darm, Paul, et al.
Published: (2025)
by: Darm, Paul, et al.
Published: (2025)
Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues
by: Kautsar, Muhammad Dehan Al, et al.
Published: (2026)
by: Kautsar, Muhammad Dehan Al, et al.
Published: (2026)
The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs
by: Fang, Xi, et al.
Published: (2025)
by: Fang, Xi, et al.
Published: (2025)
CausalT5K: Diagnosing and Informing Refusal for Trustworthy Causal Reasoning of Skepticism, Sycophancy, Detection-Correction, and Rung Collapse
by: Geng, Longling, et al.
Published: (2026)
by: Geng, Longling, et al.
Published: (2026)
On Explaining with Attention Matrices
by: Naim, Omar, et al.
Published: (2024)
by: Naim, Omar, et al.
Published: (2024)
Process Supervision-Guided Policy Optimization for Code Generation
by: Dai, Ning, et al.
Published: (2024)
by: Dai, Ning, et al.
Published: (2024)
Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text
by: Oketunji, Abiodun Finbarrs
Published: (2023)
by: Oketunji, Abiodun Finbarrs
Published: (2023)
HIP Network: Historical Information Passing Network for Extrapolation Reasoning on Temporal Knowledge Graph
by: He, Yongquan, et al.
Published: (2024)
by: He, Yongquan, et al.
Published: (2024)
Paying Attention to Deflections: Mining Pragmatic Nuances for Whataboutism Detection in Online Discourse
by: Phi, Khiem, et al.
Published: (2024)
by: Phi, Khiem, et al.
Published: (2024)
Large Language Models as 'Hidden Persuaders': Fake Product Reviews are Indistinguishable to Humans and Machines
by: Meng, Weiyao, et al.
Published: (2025)
by: Meng, Weiyao, et al.
Published: (2025)
Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
by: Yuan, Xinzhe, et al.
Published: (2026)
by: Yuan, Xinzhe, et al.
Published: (2026)
Similar Items
-
RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs
by: Saji, Alan, et al.
Published: (2025) -
Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
by: Peters, Sydney, et al.
Published: (2025) -
Understanding LLM Evaluator Behavior: A Structured Multi-Evaluator Framework for Merchant Risk Assessment
by: Wang, Liang, et al.
Published: (2026) -
Evaluating Relational Reasoning in LLMs with REL
by: Fesser, Lukas, et al.
Published: (2026) -
Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs
by: Xu, Chenjun, et al.
Published: (2025)