Saved in:
| Main Authors: | Liu, Zheng, Liu, Mengjie, Wen, Siwei, Cai, Mengzhang, Cui, Bin, He, Conghui, Zhang, Wentao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.16591 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
by: Tang, Zinan, et al.
Published: (2025)
by: Tang, Zinan, et al.
Published: (2025)
Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets
by: Gao, Xin, et al.
Published: (2025)
by: Gao, Xin, et al.
Published: (2025)
SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models
by: Liu, Zheng, et al.
Published: (2024)
by: Liu, Zheng, et al.
Published: (2024)
DARO: Difficulty-Aware Reweighting Policy Optimization
by: Zhou, Jingyu, et al.
Published: (2025)
by: Zhou, Jingyu, et al.
Published: (2025)
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search
by: Sun, Linzhuang, et al.
Published: (2024)
by: Sun, Linzhuang, et al.
Published: (2024)
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
by: Li, Chen, et al.
Published: (2025)
by: Li, Chen, et al.
Published: (2025)
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer
by: Lin, Honglin, et al.
Published: (2025)
by: Lin, Honglin, et al.
Published: (2025)
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
by: Liu, Zheng, et al.
Published: (2025)
by: Liu, Zheng, et al.
Published: (2025)
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models
by: Li, Bozhou, et al.
Published: (2024)
by: Li, Bozhou, et al.
Published: (2024)
Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM
by: Liu, Mengjie, et al.
Published: (2025)
by: Liu, Mengjie, et al.
Published: (2025)
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
by: Yao, Dingyu, et al.
Published: (2025)
by: Yao, Dingyu, et al.
Published: (2025)
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
by: Lin, Xingyu, et al.
Published: (2025)
by: Lin, Xingyu, et al.
Published: (2025)
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
by: Wen, Zichen, et al.
Published: (2025)
by: Wen, Zichen, et al.
Published: (2025)
IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning
by: He, Yinhan, et al.
Published: (2026)
by: He, Yinhan, et al.
Published: (2026)
Soft Adaptive Policy Optimization
by: Gao, Chang, et al.
Published: (2025)
by: Gao, Chang, et al.
Published: (2025)
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Sequence-Level Likelihood
by: Lin, Xingyu, et al.
Published: (2026)
by: Lin, Xingyu, et al.
Published: (2026)
Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding
by: Pan, Lehan, et al.
Published: (2026)
by: Pan, Lehan, et al.
Published: (2026)
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
by: Bai, Tianyi, et al.
Published: (2025)
by: Bai, Tianyi, et al.
Published: (2025)
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
by: Zhao, Zhiyuan, et al.
Published: (2023)
by: Zhao, Zhiyuan, et al.
Published: (2023)
Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards
by: Ren, Mengjie, et al.
Published: (2026)
by: Ren, Mengjie, et al.
Published: (2026)
Geometric-Mean Policy Optimization
by: Zhao, Yuzhong, et al.
Published: (2025)
by: Zhao, Yuzhong, et al.
Published: (2025)
Data Proportion Detection for Optimized Data Management for Large Language Models
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
Discriminative Policy Optimization for Token-Level Reward Models
by: Chen, Hongzhan, et al.
Published: (2025)
by: Chen, Hongzhan, et al.
Published: (2025)
Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More
by: Wen, Zichen, et al.
Published: (2025)
by: Wen, Zichen, et al.
Published: (2025)
AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation
by: Wang, Zhengren, et al.
Published: (2026)
by: Wang, Zhengren, et al.
Published: (2026)
Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens
by: Zhang, Yichi, et al.
Published: (2026)
by: Zhang, Yichi, et al.
Published: (2026)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
by: Liu, Xiaoqian, et al.
Published: (2025)
by: Liu, Xiaoqian, et al.
Published: (2025)
IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining
by: Feng, Dawei, et al.
Published: (2024)
by: Feng, Dawei, et al.
Published: (2024)
Teaching LLM to be Persuasive: Reward-Enhanced Policy Optimization for Alignment from Heterogeneous Rewards
by: Zeng, Xia, et al.
Published: (2025)
by: Zeng, Xia, et al.
Published: (2025)
Token-level Direct Preference Optimization
by: Zeng, Yongcheng, et al.
Published: (2024)
by: Zeng, Yongcheng, et al.
Published: (2024)
More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration
by: Yuan, Xiaoyang, et al.
Published: (2025)
by: Yuan, Xiaoyang, et al.
Published: (2025)
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
by: Lai, Peichao, et al.
Published: (2025)
by: Lai, Peichao, et al.
Published: (2025)
Authorship Style Transfer with Policy Optimization
by: Liu, Shuai, et al.
Published: (2024)
by: Liu, Shuai, et al.
Published: (2024)
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
by: Deng, Ken, et al.
Published: (2025)
by: Deng, Ken, et al.
Published: (2025)
TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation
by: Fodeh, Samah, et al.
Published: (2026)
by: Fodeh, Samah, et al.
Published: (2026)
Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization
by: Zhu, Yutao, et al.
Published: (2025)
by: Zhu, Yutao, et al.
Published: (2025)
Fibration Policy Optimization
by: Li, Chang, et al.
Published: (2026)
by: Li, Chang, et al.
Published: (2026)
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
by: Liu, Yuhang, et al.
Published: (2025)
by: Liu, Yuhang, et al.
Published: (2025)
Similar Items
-
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
by: Tang, Zinan, et al.
Published: (2025) -
Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets
by: Gao, Xin, et al.
Published: (2025) -
SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models
by: Liu, Zheng, et al.
Published: (2024) -
DARO: Difficulty-Aware Reweighting Policy Optimization
by: Zhou, Jingyu, et al.
Published: (2025) -
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search
by: Sun, Linzhuang, et al.
Published: (2024)