Saved in:
| Main Authors: | Cheng, Ziling, Cao, Meng, Pishdad, Leila, Cao, Yanshuai, Cheung, Jackie Chi Kit |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.23701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs
by: Cheng, Ziling, et al.
Published: (2025)
by: Cheng, Ziling, et al.
Published: (2025)
PreSumm: Predicting Summarization Performance Without Summarizing
by: Koniaev, Steven, et al.
Published: (2025)
by: Koniaev, Steven, et al.
Published: (2025)
Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
by: Yu, Lei, et al.
Published: (2024)
by: Yu, Lei, et al.
Published: (2024)
Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought
by: Ramji, Keshav, et al.
Published: (2026)
by: Ramji, Keshav, et al.
Published: (2026)
Solving the Challenge Set without Solving the Task: On Winograd Schemas as a Test of Pronominal Coreference Resolution
by: Porada, Ian, et al.
Published: (2024)
by: Porada, Ian, et al.
Published: (2024)
Ensemble Distillation for Unsupervised Constituency Parsing
by: Shayegh, Behzad, et al.
Published: (2023)
by: Shayegh, Behzad, et al.
Published: (2023)
Can LLMs Solve longer Math Word Problems Better?
by: Xu, Xin, et al.
Published: (2024)
by: Xu, Xin, et al.
Published: (2024)
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
by: Chehbouni, Khaoula, et al.
Published: (2025)
by: Chehbouni, Khaoula, et al.
Published: (2025)
Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning
by: Hersche, Michael, et al.
Published: (2024)
by: Hersche, Michael, et al.
Published: (2024)
CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts
by: Cha, Junuk, et al.
Published: (2025)
by: Cha, Junuk, et al.
Published: (2025)
Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs
by: Li, Jianan, et al.
Published: (2026)
by: Li, Jianan, et al.
Published: (2026)
A Controlled Reevaluation of Coreference Resolution Models
by: Porada, Ian, et al.
Published: (2024)
by: Porada, Ian, et al.
Published: (2024)
Does This Summary Answer My Question? Modeling Query-Focused Summary Readers with Rational Speech Acts
by: Piano, Cesare Spinoso-Di, et al.
Published: (2024)
by: Piano, Cesare Spinoso-Di, et al.
Published: (2024)
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
by: Qian, Cheng, et al.
Published: (2023)
by: Qian, Cheng, et al.
Published: (2023)
The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation
by: Lan, Yifan, et al.
Published: (2026)
by: Lan, Yifan, et al.
Published: (2026)
Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning
by: Li, Jiachun, et al.
Published: (2024)
by: Li, Jiachun, et al.
Published: (2024)
LLMs Faithfully and Iteratively Compute Answers During CoT: A Systematic Analysis With Multi-step Arithmetics
by: Kudo, Keito, et al.
Published: (2024)
by: Kudo, Keito, et al.
Published: (2024)
What Makes Math Word Problems Challenging for LLMs?
by: Srivatsa, KV Aditya, et al.
Published: (2024)
by: Srivatsa, KV Aditya, et al.
Published: (2024)
Self-consistent Reasoning For Solving Math Word Problems
by: Xiong, Jing, et al.
Published: (2022)
by: Xiong, Jing, et al.
Published: (2022)
Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution's Characteristics
by: Flores, Lorenzo Jaime Yu, et al.
Published: (2025)
by: Flores, Lorenzo Jaime Yu, et al.
Published: (2025)
$\texttt{COSMIC}$: Mutual Information for Task-Agnostic Summarization Evaluation
by: Darrin, Maxime, et al.
Published: (2024)
by: Darrin, Maxime, et al.
Published: (2024)
Can Vision Language Models Be Adaptive in Mathematics Education? A Learner Model-based Rubric Study
by: Gao, Jie, et al.
Published: (2026)
by: Gao, Jie, et al.
Published: (2026)
Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models
by: Zhang, Ruiqi, et al.
Published: (2025)
by: Zhang, Ruiqi, et al.
Published: (2025)
Structured Reasoning with Tree-of-Thoughts for Bengali Math Word Problems
by: Mahmood, Aurprita, et al.
Published: (2025)
by: Mahmood, Aurprita, et al.
Published: (2025)
Augmenting Math Word Problems via Iterative Question Composing
by: Liu, Haoxiong, et al.
Published: (2024)
by: Liu, Haoxiong, et al.
Published: (2024)
CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs
by: Li, Li, et al.
Published: (2025)
by: Li, Li, et al.
Published: (2025)
Linear Half-Space Problems in Kinetic Theory: Abstract Formulation and Regime Transitions
by: Bernhoff, Niclas
Published: (2022)
by: Bernhoff, Niclas
Published: (2022)
Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game
by: Samadarshi, Prisha, et al.
Published: (2024)
by: Samadarshi, Prisha, et al.
Published: (2024)
A Unified View of Abstract Visual Reasoning Problems
by: Małkiński, Mikołaj, et al.
Published: (2024)
by: Małkiński, Mikołaj, et al.
Published: (2024)
Understanding Formal Reasoning Failures in LLMs as Abstract Interpreters
by: Mitchell, Jacqueline L., et al.
Published: (2025)
by: Mitchell, Jacqueline L., et al.
Published: (2025)
Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning
by: Lyu, Tianwen, et al.
Published: (2025)
by: Lyu, Tianwen, et al.
Published: (2025)
On the Morse Index with Constraints I: An Abstract Formulation
by: Tran, Hung, et al.
Published: (2020)
by: Tran, Hung, et al.
Published: (2020)
Adversarial Math Word Problem Generation
by: Xie, Roy, et al.
Published: (2024)
by: Xie, Roy, et al.
Published: (2024)
How Likely Do LLMs with CoT Mimic Human Reasoning?
by: Bao, Guangsheng, et al.
Published: (2024)
by: Bao, Guangsheng, et al.
Published: (2024)
Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning
by: Flores, Lorenzo Jaime Yu, et al.
Published: (2026)
by: Flores, Lorenzo Jaime Yu, et al.
Published: (2026)
Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective
by: Porada, Ian, et al.
Published: (2023)
by: Porada, Ian, et al.
Published: (2023)
$(RSA)^2$: A Rhetorical-Strategy-Aware Rational Speech Act Framework for Figurative Language Understanding
by: Piano, Cesare Spinoso-Di, et al.
Published: (2025)
by: Piano, Cesare Spinoso-Di, et al.
Published: (2025)
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
by: Nai, Ruiqian, et al.
Published: (2024)
by: Nai, Ruiqian, et al.
Published: (2024)
When LLMs Meet API Documentation: Can Retrieval Augmentation Aid Code Generation Just as It Helps Developers?
by: Chen, Jingyi, et al.
Published: (2025)
by: Chen, Jingyi, et al.
Published: (2025)
Solving Math Word Problems via Cooperative Reasoning induced Language Models
by: Zhu, Xinyu, et al.
Published: (2022)
by: Zhu, Xinyu, et al.
Published: (2022)
Similar Items
-
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs
by: Cheng, Ziling, et al.
Published: (2025) -
PreSumm: Predicting Summarization Performance Without Summarizing
by: Koniaev, Steven, et al.
Published: (2025) -
Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
by: Yu, Lei, et al.
Published: (2024) -
Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought
by: Ramji, Keshav, et al.
Published: (2026) -
Solving the Challenge Set without Solving the Task: On Winograd Schemas as a Test of Pronominal Coreference Resolution
by: Porada, Ian, et al.
Published: (2024)