Saved in:
| Main Authors: | Gao, Zhitao, Ma, Jie, Li, Xuhong, Li, Pengyu, Qu, Ning, Wu, Yaqiang, Liu, Hui, Liu, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03084 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
$\textbf{AGT$^{AO}$}$: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality
by: Li, Pengyu, et al.
Published: (2026)
by: Li, Pengyu, et al.
Published: (2026)
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs
by: Ma, Jie, et al.
Published: (2025)
by: Ma, Jie, et al.
Published: (2025)
Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors
by: Li, Tianchi, et al.
Published: (2026)
by: Li, Tianchi, et al.
Published: (2026)
Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation
by: Zhang, Yong, et al.
Published: (2025)
by: Zhang, Yong, et al.
Published: (2025)
Scaling Latent Reasoning via Looped Language Models
by: Zhu, Rui-Jie, et al.
Published: (2025)
by: Zhu, Rui-Jie, et al.
Published: (2025)
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
by: Ma, Jie, et al.
Published: (2025)
by: Ma, Jie, et al.
Published: (2025)
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
by: Huang, Muye, et al.
Published: (2025)
by: Huang, Muye, et al.
Published: (2025)
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
by: Liu, Qiyuan, et al.
Published: (2025)
by: Liu, Qiyuan, et al.
Published: (2025)
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
by: Li, Yafu, et al.
Published: (2025)
by: Li, Yafu, et al.
Published: (2025)
OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving
by: Zhang, Xinyu, et al.
Published: (2026)
by: Zhang, Xinyu, et al.
Published: (2026)
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback
by: Liu, Yanming, et al.
Published: (2024)
by: Liu, Yanming, et al.
Published: (2024)
Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA
by: Wang, Zihua, et al.
Published: (2026)
by: Wang, Zihua, et al.
Published: (2026)
From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning
by: Li, Haoyu, et al.
Published: (2025)
by: Li, Haoyu, et al.
Published: (2025)
Autonomous Algorithm Discovery for Ptychography via Evolutionary LLM Reasoning
by: Yin, Xiangyu, et al.
Published: (2026)
by: Yin, Xiangyu, et al.
Published: (2026)
Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models
by: Ma, Jie, et al.
Published: (2024)
by: Ma, Jie, et al.
Published: (2024)
Beyond Rejection Sampling: Trajectory Fusion for Scaling Mathematical Reasoning
by: Deng, Jie, et al.
Published: (2026)
by: Deng, Jie, et al.
Published: (2026)
In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback
by: Zhu, Mingye, et al.
Published: (2025)
by: Zhu, Mingye, et al.
Published: (2025)
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
by: Ning, Yansong, et al.
Published: (2025)
by: Ning, Yansong, et al.
Published: (2025)
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
by: Chang, Qikai, et al.
Published: (2025)
by: Chang, Qikai, et al.
Published: (2025)
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
by: Yu, Yaoning, et al.
Published: (2025)
by: Yu, Yaoning, et al.
Published: (2025)
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning
by: Videau, Mathurin, et al.
Published: (2024)
by: Videau, Mathurin, et al.
Published: (2024)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
by: Liu, Xiaoqian, et al.
Published: (2025)
by: Liu, Xiaoqian, et al.
Published: (2025)
GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression
by: Liu, Kainan, et al.
Published: (2024)
by: Liu, Kainan, et al.
Published: (2024)
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
by: Wang, Yiting, et al.
Published: (2025)
by: Wang, Yiting, et al.
Published: (2025)
Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
by: Zhu, Xunyu, et al.
Published: (2024)
by: Zhu, Xunyu, et al.
Published: (2024)
SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More
by: Huang, Muye, et al.
Published: (2026)
by: Huang, Muye, et al.
Published: (2026)
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
by: Peng, Qiwei, et al.
Published: (2024)
by: Peng, Qiwei, et al.
Published: (2024)
Enhancing LLM Reasoning via Non-Human-Like Reasoning Path Preference Optimization
by: Lu, Junjie, et al.
Published: (2025)
by: Lu, Junjie, et al.
Published: (2025)
Preference Optimization for Reasoning with Pseudo Feedback
by: Jiao, Fangkai, et al.
Published: (2024)
by: Jiao, Fangkai, et al.
Published: (2024)
Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference
by: Xie, Wenxuan, et al.
Published: (2026)
by: Xie, Wenxuan, et al.
Published: (2026)
Generalized Category Discovery with Large Language Models in the Loop
by: An, Wenbin, et al.
Published: (2023)
by: An, Wenbin, et al.
Published: (2023)
A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops
by: Yuksel, Kamer Ali, et al.
Published: (2024)
by: Yuksel, Kamer Ali, et al.
Published: (2024)
Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback
by: Li, Aiden Yiliu, et al.
Published: (2025)
by: Li, Aiden Yiliu, et al.
Published: (2025)
Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization
by: Gui, Runquan, et al.
Published: (2026)
by: Gui, Runquan, et al.
Published: (2026)
Efficient Reasoning via Chain of Unconscious Thought
by: Gong, Ruihan, et al.
Published: (2025)
by: Gong, Ruihan, et al.
Published: (2025)
Robust Preference Optimization via Dynamic Target Margins
by: Sun, Jie, et al.
Published: (2025)
by: Sun, Jie, et al.
Published: (2025)
BPO: Revisiting Preference Modeling in Direct Preference Optimization
by: Sun, Lin, et al.
Published: (2025)
by: Sun, Lin, et al.
Published: (2025)
AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning
by: Wei, Yifan, et al.
Published: (2025)
by: Wei, Yifan, et al.
Published: (2025)
Joint Optimization of Reasoning and Dual-Memory for Self-Learning Diagnostic Agent
by: Li, Bingxuan, et al.
Published: (2026)
by: Li, Bingxuan, et al.
Published: (2026)
Leveraging Biases in Large Language Models: "bias-kNN'' for Effective Few-Shot Learning
by: Zhang, Yong, et al.
Published: (2024)
by: Zhang, Yong, et al.
Published: (2024)
Similar Items
-
$\textbf{AGT$^{AO}$}$: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality
by: Li, Pengyu, et al.
Published: (2026) -
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs
by: Ma, Jie, et al.
Published: (2025) -
Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors
by: Li, Tianchi, et al.
Published: (2026) -
Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation
by: Zhang, Yong, et al.
Published: (2025) -
Scaling Latent Reasoning via Looped Language Models
by: Zhu, Rui-Jie, et al.
Published: (2025)