Enregistré dans:
| Auteurs principaux: | Pan, Haihui, Bao, Junwei, Jiang, Hongfei, Song, Yang |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2605.28389 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
par: Hong, Yuzhong, et autres
Publié: (2024)
par: Hong, Yuzhong, et autres
Publié: (2024)
BoRA: Bi-dimensional Weight-Decomposed Low-Rank Adaptation
par: Wang, Qiushi, et autres
Publié: (2024)
par: Wang, Qiushi, et autres
Publié: (2024)
Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment
par: Zhao, Jing, et autres
Publié: (2026)
par: Zhao, Jing, et autres
Publié: (2026)
Better, Faster: Harnessing Self-Improvement in Large Reasoning Models
par: Zhong, Qihuang, et autres
Publié: (2026)
par: Zhong, Qihuang, et autres
Publié: (2026)
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
par: Fan, Yuchen, et autres
Publié: (2024)
par: Fan, Yuchen, et autres
Publié: (2024)
Multi-Turn Interactions for Text-to-SQL with Large Language Models
par: Xiong, Guanming, et autres
Publié: (2024)
par: Xiong, Guanming, et autres
Publié: (2024)
VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency
par: Han, Vernon Toh Yan, et autres
Publié: (2023)
par: Han, Vernon Toh Yan, et autres
Publié: (2023)
LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning
par: Chen, Shuguang, et autres
Publié: (2024)
par: Chen, Shuguang, et autres
Publié: (2024)
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
par: Yan, Yuchen, et autres
Publié: (2024)
par: Yan, Yuchen, et autres
Publié: (2024)
SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
par: Yoon, Kanghoon, et autres
Publié: (2025)
par: Yoon, Kanghoon, et autres
Publié: (2025)
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
par: Jia, Mengzhao, et autres
Publié: (2024)
par: Jia, Mengzhao, et autres
Publié: (2024)
Forward-Backward Reasoning in Large Language Models for Mathematical Verification
par: Jiang, Weisen, et autres
Publié: (2023)
par: Jiang, Weisen, et autres
Publié: (2023)
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better
par: Zhao, Ji, et autres
Publié: (2026)
par: Zhao, Ji, et autres
Publié: (2026)
Self-Verification Dilemma: Experience-Driven Suppression of Overused Checking in LLM Reasoning
par: Long, Quanyu, et autres
Publié: (2026)
par: Long, Quanyu, et autres
Publié: (2026)
Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector
par: Yang, Haihui, et autres
Publié: (2024)
par: Yang, Haihui, et autres
Publié: (2024)
Question Translation Training for Better Multilingual Reasoning
par: Zhu, Wenhao, et autres
Publié: (2024)
par: Zhu, Wenhao, et autres
Publié: (2024)
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
par: Qin, Zhen, et autres
Publié: (2023)
par: Qin, Zhen, et autres
Publié: (2023)
Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin
par: Zhang, Yao, et autres
Publié: (2026)
par: Zhang, Yao, et autres
Publié: (2026)
More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning
par: Zhao, Yike, et autres
Publié: (2025)
par: Zhao, Yike, et autres
Publié: (2025)
A Survey on LLM Mid-Training
par: Tu, Chengying, et autres
Publié: (2025)
par: Tu, Chengying, et autres
Publié: (2025)
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
par: Bansal, Hritik, et autres
Publié: (2024)
par: Bansal, Hritik, et autres
Publié: (2024)
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
par: Zhang, Zhihan, et autres
Publié: (2024)
par: Zhang, Zhihan, et autres
Publié: (2024)
Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification
par: Barone, Antonio Valerio Miceli, et autres
Publié: (2026)
par: Barone, Antonio Valerio Miceli, et autres
Publié: (2026)
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
par: Gao, Bofei, et autres
Publié: (2024)
par: Gao, Bofei, et autres
Publié: (2024)
Learning to Better Search with Language Models via Guided Reinforced Self-Training
par: Moon, Seungyong, et autres
Publié: (2024)
par: Moon, Seungyong, et autres
Publié: (2024)
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
par: Chen, Xiao, et autres
Publié: (2025)
par: Chen, Xiao, et autres
Publié: (2025)
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
par: Shi, Hengyu, et autres
Publié: (2026)
par: Shi, Hengyu, et autres
Publié: (2026)
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
par: Lu, Zimu, et autres
Publié: (2024)
par: Lu, Zimu, et autres
Publié: (2024)
Self-Trained Verification for Training- and Test-Time Self-Improvement
par: Wu, Chen Henry, et autres
Publié: (2026)
par: Wu, Chen Henry, et autres
Publié: (2026)
Making Bielik LLM Reason (Better): A Field Report
par: Trybus, Adam, et autres
Publié: (2026)
par: Trybus, Adam, et autres
Publié: (2026)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
par: Yu, Dian, et autres
Publié: (2024)
par: Yu, Dian, et autres
Publié: (2024)
Pensez: Less Data, Better Reasoning -- Rethinking French LLM
par: Ha, Huy Hoang
Publié: (2025)
par: Ha, Huy Hoang
Publié: (2025)
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
par: Li, Zhen, et autres
Publié: (2025)
par: Li, Zhen, et autres
Publié: (2025)
Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR
par: Xu, Haobo, et autres
Publié: (2026)
par: Xu, Haobo, et autres
Publié: (2026)
Faster and Better LLMs via Latency-Aware Test-Time Scaling
par: Wang, Zili, et autres
Publié: (2025)
par: Wang, Zili, et autres
Publié: (2025)
Better & Faster Large Language Models via Multi-token Prediction
par: Gloeckle, Fabian, et autres
Publié: (2024)
par: Gloeckle, Fabian, et autres
Publié: (2024)
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
par: Zhu, Tongyao, et autres
Publié: (2025)
par: Zhu, Tongyao, et autres
Publié: (2025)
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
par: Pouransari, Hadi, et autres
Publié: (2024)
par: Pouransari, Hadi, et autres
Publié: (2024)
ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs
par: Li, Chaoyu, et autres
Publié: (2025)
par: Li, Chaoyu, et autres
Publié: (2025)
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification
par: Sanyal, Soumya, et autres
Publié: (2024)
par: Sanyal, Soumya, et autres
Publié: (2024)
Documents similaires
-
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
par: Hong, Yuzhong, et autres
Publié: (2024) -
BoRA: Bi-dimensional Weight-Decomposed Low-Rank Adaptation
par: Wang, Qiushi, et autres
Publié: (2024) -
Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment
par: Zhao, Jing, et autres
Publié: (2026) -
Better, Faster: Harnessing Self-Improvement in Large Reasoning Models
par: Zhong, Qihuang, et autres
Publié: (2026) -
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
par: Fan, Yuchen, et autres
Publié: (2024)