Saved in:
| Main Authors: | Wang, Zhuohan, Zhu, Ziwei, Li, Ziniu, Chen, Congliang, Han, Yizhou, Lin, Yufeng, Lin, Zhihang, Gu, Angyang, Hu, Xinglin, Sun, Ruoyu, Ding, Tian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.27610 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
by: Li, Ziniu, et al.
Published: (2025)
by: Li, Ziniu, et al.
Published: (2025)
Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving
by: Yang, Tianyun, et al.
Published: (2025)
by: Yang, Tianyun, et al.
Published: (2025)
Why Transformers Need Adam: A Hessian Perspective
by: Zhang, Yushun, et al.
Published: (2024)
by: Zhang, Yushun, et al.
Published: (2024)
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
by: Li, Ziniu, et al.
Published: (2024)
by: Li, Ziniu, et al.
Published: (2024)
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
by: Li, Ziniu, et al.
Published: (2023)
by: Li, Ziniu, et al.
Published: (2023)
Adam-mini: Use Fewer Learning Rates To Gain More
by: Zhang, Yushun, et al.
Published: (2024)
by: Zhang, Yushun, et al.
Published: (2024)
Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives
by: Liu, Yajiao, et al.
Published: (2025)
by: Liu, Yajiao, et al.
Published: (2025)
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
by: Chen, Yupeng, et al.
Published: (2024)
by: Chen, Yupeng, et al.
Published: (2024)
Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference
by: Qin, Zongyue, et al.
Published: (2024)
by: Qin, Zongyue, et al.
Published: (2024)
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
by: Tang, Zhengyang, et al.
Published: (2025)
by: Tang, Zhengyang, et al.
Published: (2025)
Adam Converges Without Any Modification On Update Rules
by: Zhang, Yushun, et al.
Published: (2026)
by: Zhang, Yushun, et al.
Published: (2026)
Changes of the Primary Cilia in Alzheimer's Disease Pathogenesis
by: Angyang Guo, et al.
Published: (2025)
by: Angyang Guo, et al.
Published: (2025)
A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware
by: Zhang, Shichang, et al.
Published: (2023)
by: Zhang, Shichang, et al.
Published: (2023)
A Study in Markov Chains, Loop-Erased Random Walk and Loop Soups
by: Gu, Zhuohan
Published: (2024)
by: Gu, Zhuohan
Published: (2024)
Teaching Language Models to Reason with Tools
by: Li, Chengpeng, et al.
Published: (2025)
by: Li, Chengpeng, et al.
Published: (2025)
Self-Evolving Critique Abilities in Large Language Models
by: Tang, Zhengyang, et al.
Published: (2025)
by: Tang, Zhengyang, et al.
Published: (2025)
MMInA: Benchmarking Multihop Multimodal Internet Agents
by: Tian, Shulin, et al.
Published: (2024)
by: Tian, Shulin, et al.
Published: (2024)
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
by: Lin, Zongyu, et al.
Published: (2025)
by: Lin, Zongyu, et al.
Published: (2025)
Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
by: Qin, Zongyue, et al.
Published: (2024)
by: Qin, Zongyue, et al.
Published: (2024)
Automated Molecular Concept Generation and Labeling with Large Language Models
by: Zhang, Zimin, et al.
Published: (2024)
by: Zhang, Zimin, et al.
Published: (2024)
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
by: Lin, Zhihang, et al.
Published: (2025)
by: Lin, Zhihang, et al.
Published: (2025)
Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks
by: Subramonian, Arjun, et al.
Published: (2024)
by: Subramonian, Arjun, et al.
Published: (2024)
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
by: Wang, Xiaoxuan, et al.
Published: (2023)
by: Wang, Xiaoxuan, et al.
Published: (2023)
Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models
by: Ding, Zeyang, et al.
Published: (2026)
by: Ding, Zeyang, et al.
Published: (2026)
RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?
by: Dai, Yuyang, et al.
Published: (2026)
by: Dai, Yuyang, et al.
Published: (2026)
CoRT: Code-integrated Reasoning within Thinking
by: Li, Chengpeng, et al.
Published: (2025)
by: Li, Chengpeng, et al.
Published: (2025)
Policy Optimization in RLHF: The Impact of Out-of-preference Data
by: Li, Ziniu, et al.
Published: (2023)
by: Li, Ziniu, et al.
Published: (2023)
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking
by: Gu, Zihan, et al.
Published: (2025)
by: Gu, Zihan, et al.
Published: (2025)
Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective
by: Wang, Senmiao, et al.
Published: (2025)
by: Wang, Senmiao, et al.
Published: (2025)
Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis
by: Yang, Xinrui, et al.
Published: (2024)
by: Yang, Xinrui, et al.
Published: (2024)
Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models
by: Xu, Zihao, et al.
Published: (2026)
by: Xu, Zihao, et al.
Published: (2026)
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
by: Lin, Zhihang, et al.
Published: (2024)
by: Lin, Zhihang, et al.
Published: (2024)
Off-Policy Value-Based Reinforcement Learning for Large Language Models
by: Wang, Peng-Yuan, et al.
Published: (2026)
by: Wang, Peng-Yuan, et al.
Published: (2026)
Exact Causal Attention with 10% Fewer Operations
by: Rybin, Dmitry, et al.
Published: (2025)
by: Rybin, Dmitry, et al.
Published: (2025)
An Extended Space‐Time Network With Explicit Incompatibility Modelling for High‐Speed Railway Timetabling
by: Angyang Chen, et al.
Published: (2025)
by: Angyang Chen, et al.
Published: (2025)
Speculative Decoding Reimagined for Multimodal Large Language Models
by: Lin, Luxi, et al.
Published: (2025)
by: Lin, Luxi, et al.
Published: (2025)
Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling
by: Huang, Zijie, et al.
Published: (2024)
by: Huang, Zijie, et al.
Published: (2024)
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
by: Chen, Peter, et al.
Published: (2025)
by: Chen, Peter, et al.
Published: (2025)
The Paradox of Outcome Optimization: A Causal Information-Theoretic Bound on Reasoning Shortcuts in LLMs
by: Chen, Zihan, et al.
Published: (2026)
by: Chen, Zihan, et al.
Published: (2026)
Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
by: Yang, Hongming, et al.
Published: (2025)
by: Yang, Hongming, et al.
Published: (2025)
Similar Items
-
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
by: Li, Ziniu, et al.
Published: (2025) -
Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving
by: Yang, Tianyun, et al.
Published: (2025) -
Why Transformers Need Adam: A Hessian Perspective
by: Zhang, Yushun, et al.
Published: (2024) -
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
by: Li, Ziniu, et al.
Published: (2024) -
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
by: Li, Ziniu, et al.
Published: (2023)