Saved in:
| Main Authors: | Zhang, Yongheng, Chen, Qiguang, Zhou, Jingxuan, Wang, Peng, Si, Jiasheng, Wang, Jin, Lu, Wenpeng, Qin, Libo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.04463 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CroPrompt: Cross-task Interactive Prompting for Zero-shot Spoken Language Understanding
by: Qin, Libo, et al.
Published: (2024)
by: Qin, Libo, et al.
Published: (2024)
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
by: Chen, Qiguang, et al.
Published: (2024)
by: Chen, Qiguang, et al.
Published: (2024)
AutoCAP: Towards Automatic Cross-lingual Alignment Planning for Zero-shot Chain-of-Thought
by: Zhang, Yongheng, et al.
Published: (2024)
by: Zhang, Yongheng, et al.
Published: (2024)
CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
by: Zhang, Yongheng, et al.
Published: (2025)
by: Zhang, Yongheng, et al.
Published: (2025)
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
by: Chen, Qiguang, et al.
Published: (2025)
by: Chen, Qiguang, et al.
Published: (2025)
Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
by: Zhang, Yongheng, et al.
Published: (2025)
by: Zhang, Yongheng, et al.
Published: (2025)
M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought
by: Chen, Qiguang, et al.
Published: (2024)
by: Chen, Qiguang, et al.
Published: (2024)
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
by: Yao, Jihan, et al.
Published: (2024)
by: Yao, Jihan, et al.
Published: (2024)
CHECKWHY: Causal Fact Verification via Argument Structure
by: Si, Jiasheng, et al.
Published: (2024)
by: Si, Jiasheng, et al.
Published: (2024)
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
by: Chen, Qiguang, et al.
Published: (2025)
by: Chen, Qiguang, et al.
Published: (2025)
DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective
by: Peng, Dengyun, et al.
Published: (2025)
by: Peng, Dengyun, et al.
Published: (2025)
Less Languages, Less Tokens: An Efficient Unified Logic Cross-lingual Chain-of-Thought Reasoning Framework
by: Zhang, Chenyuan, et al.
Published: (2026)
by: Zhang, Chenyuan, et al.
Published: (2026)
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
by: Cheng, Zihui, et al.
Published: (2025)
by: Cheng, Zihui, et al.
Published: (2025)
Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models
by: Chen, Qiguang, et al.
Published: (2025)
by: Chen, Qiguang, et al.
Published: (2025)
Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks
by: Guan, Jiannan, et al.
Published: (2025)
by: Guan, Jiannan, et al.
Published: (2025)
Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis
by: Ling, Zipeng, et al.
Published: (2026)
by: Ling, Zipeng, et al.
Published: (2026)
X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System
by: Wang, Peng, et al.
Published: (2025)
by: Wang, Peng, et al.
Published: (2025)
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
by: Chen, Qiguang, et al.
Published: (2026)
by: Chen, Qiguang, et al.
Published: (2026)
SRLCG: Self-Rectified Large-Scale Code Generation with Multidimensional Chain-of-Thought and Dynamic Backtracking
by: Ma, Hongru, et al.
Published: (2025)
by: Ma, Hongru, et al.
Published: (2025)
Wrong as Sequence Violation: The Structural Definition of Wrong as Misordering Across Reasoning, Physics, Computation, Cognition, and Ethics
by: Stewart, Arthur
Published: (2026)
by: Stewart, Arthur
Published: (2026)
What is Wrong with Perplexity for Long-context Language Modeling?
by: Fang, Lizhe, et al.
Published: (2024)
by: Fang, Lizhe, et al.
Published: (2024)
Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes
by: Wang, Weiming, et al.
Published: (2026)
by: Wang, Weiming, et al.
Published: (2026)
Visual Grounding Methods for VQA are Working for the Wrong Reasons!
by: Shrestha, Robik, et al.
Published: (2020)
by: Shrestha, Robik, et al.
Published: (2020)
MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification
by: Sun, Linzhuang, et al.
Published: (2025)
by: Sun, Linzhuang, et al.
Published: (2025)
Large Language Models Meet NLP: A Survey
by: Qin, Libo, et al.
Published: (2024)
by: Qin, Libo, et al.
Published: (2024)
Humans Perceive Wrong Narratives from AI Reasoning Texts
by: Levy, Mosh, et al.
Published: (2025)
by: Levy, Mosh, et al.
Published: (2025)
Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons
by: Zhou, Shijia, et al.
Published: (2024)
by: Zhou, Shijia, et al.
Published: (2024)
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning
by: Balepur, Nishant, et al.
Published: (2023)
by: Balepur, Nishant, et al.
Published: (2023)
More or Less Wrong: A Benchmark for Directional Bias in LLM Comparative Reasoning
by: Shafiei, Mohammadamin, et al.
Published: (2025)
by: Shafiei, Mohammadamin, et al.
Published: (2025)
Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models
by: Chen, Qiguang, et al.
Published: (2025)
by: Chen, Qiguang, et al.
Published: (2025)
Easy Problems That LLMs Get Wrong
by: Williams, Sean, et al.
Published: (2024)
by: Williams, Sean, et al.
Published: (2024)
ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts
by: Su, Ruiran, et al.
Published: (2025)
by: Su, Ruiran, et al.
Published: (2025)
What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
by: Qin, Libo, et al.
Published: (2024)
by: Qin, Libo, et al.
Published: (2024)
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
by: Wen, Xueru, et al.
Published: (2024)
by: Wen, Xueru, et al.
Published: (2024)
Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong
by: Si, Chenglei, et al.
Published: (2023)
by: Si, Chenglei, et al.
Published: (2023)
The Realignment Problem: When Right becomes Wrong in LLMs
by: Sharma, Aakash Sen, et al.
Published: (2025)
by: Sharma, Aakash Sen, et al.
Published: (2025)
Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation
by: He, Yanjie
Published: (2026)
by: He, Yanjie
Published: (2026)
Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries
by: Hagar, Nick, et al.
Published: (2025)
by: Hagar, Nick, et al.
Published: (2025)
What's Wrong? Refining Meeting Summaries with LLM Feedback
by: Kirstein, Frederic, et al.
Published: (2024)
by: Kirstein, Frederic, et al.
Published: (2024)
Similar Items
-
CroPrompt: Cross-task Interactive Prompting for Zero-shot Spoken Language Understanding
by: Qin, Libo, et al.
Published: (2024) -
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
by: Chen, Qiguang, et al.
Published: (2024) -
AutoCAP: Towards Automatic Cross-lingual Alignment Planning for Zero-shot Chain-of-Thought
by: Zhang, Yongheng, et al.
Published: (2024) -
CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
by: Zhang, Yongheng, et al.
Published: (2025) -
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
by: Chen, Qiguang, et al.
Published: (2025)