Saved in:
| Main Authors: | Li, Han, Peng, Xinyu, Wang, Yaoming, Peng, Zelin, Chen, Xin, Weng, Rongxiang, Wang, Jingang, Cai, Xunliang, Dai, Wenrui, Xiong, Hongkai |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.03498 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers
by: Peng, Xinyu, et al.
Published: (2026)
by: Peng, Xinyu, et al.
Published: (2026)
Noise Conditional Variational Score Distillation
by: Peng, Xinyu, et al.
Published: (2025)
by: Peng, Xinyu, et al.
Published: (2025)
FIRE: Flexible Integration of Data Quality Ratings for Effective Pre-Training
by: Xu, Liangyu, et al.
Published: (2025)
by: Xu, Liangyu, et al.
Published: (2025)
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
by: Xu, Zhihao, et al.
Published: (2026)
by: Xu, Zhihao, et al.
Published: (2026)
LinkQA: Synthesizing Diverse QA from Multiple Seeds Strongly Linked by Knowledge Points
by: Zhang, Xuemiao, et al.
Published: (2025)
by: Zhang, Xuemiao, et al.
Published: (2025)
Large-Scale Diverse Synthesis for Mid-Training
by: Zhang, Xuemiao, et al.
Published: (2025)
by: Zhang, Xuemiao, et al.
Published: (2025)
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization
by: Bai, Yang, et al.
Published: (2026)
by: Bai, Yang, et al.
Published: (2026)
Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
by: Zheng, Hongwei, et al.
Published: (2025)
by: Zheng, Hongwei, et al.
Published: (2025)
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
by: Shi, Bowen, et al.
Published: (2024)
by: Shi, Bowen, et al.
Published: (2024)
Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data
by: Zhang, Xuemiao, et al.
Published: (2025)
by: Zhang, Xuemiao, et al.
Published: (2025)
FRAME: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy
by: Zhang, Xuemiao, et al.
Published: (2025)
by: Zhang, Xuemiao, et al.
Published: (2025)
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
by: Zhang, Xuemiao, et al.
Published: (2025)
by: Zhang, Xuemiao, et al.
Published: (2025)
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism
by: Liu, Jiahao, et al.
Published: (2024)
by: Liu, Jiahao, et al.
Published: (2024)
Libra: Assessing and Improving Reward Model by Learning to Think
by: Zhou, Meng, et al.
Published: (2025)
by: Zhou, Meng, et al.
Published: (2025)
A Survey on LLM Mid-Training
by: Tu, Chengying, et al.
Published: (2025)
by: Tu, Chengying, et al.
Published: (2025)
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
by: Liu, Yuchen, et al.
Published: (2025)
by: Liu, Yuchen, et al.
Published: (2025)
Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance
by: Peng, Xinyu, et al.
Published: (2024)
by: Peng, Xinyu, et al.
Published: (2024)
Making Mathematical Reasoning Adaptive
by: Lai, Zhejian, et al.
Published: (2025)
by: Lai, Zhejian, et al.
Published: (2025)
LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance
by: Fan, Yuchun, et al.
Published: (2026)
by: Fan, Yuchun, et al.
Published: (2026)
Information-Theoretic Optimization for Task-Adapted Compressed Sensing Magnetic Resonance Imaging
by: Peng, Xinyu, et al.
Published: (2026)
by: Peng, Xinyu, et al.
Published: (2026)
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
by: Liu, Mingdao, et al.
Published: (2024)
by: Liu, Mingdao, et al.
Published: (2024)
Graph-Structured Speculative Decoding
by: Gong, Zhuocheng, et al.
Published: (2024)
by: Gong, Zhuocheng, et al.
Published: (2024)
Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation
by: Peng, Zelin, et al.
Published: (2024)
by: Peng, Zelin, et al.
Published: (2024)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
by: Wu, Pengfei, et al.
Published: (2024)
by: Wu, Pengfei, et al.
Published: (2024)
Frequency-Aware Transformer for Learned Image Compression
by: Li, Han, et al.
Published: (2023)
by: Li, Han, et al.
Published: (2023)
MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization
by: Fei, Wen, et al.
Published: (2020)
by: Fei, Wen, et al.
Published: (2020)
SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
by: Wang, Chao, et al.
Published: (2026)
by: Wang, Chao, et al.
Published: (2026)
InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation
by: Liu, Jinlai, et al.
Published: (2025)
by: Liu, Jinlai, et al.
Published: (2025)
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training
by: Wang, Zhijun, et al.
Published: (2025)
by: Wang, Zhijun, et al.
Published: (2025)
Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment
by: Li, Han, et al.
Published: (2025)
by: Li, Han, et al.
Published: (2025)
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
by: Shao, Ruichen, et al.
Published: (2025)
by: Shao, Ruichen, et al.
Published: (2025)
Dynamic Fisher-weighted Model Merging via Bayesian Optimization
by: Lee, Sanwoo, et al.
Published: (2025)
by: Lee, Sanwoo, et al.
Published: (2025)
Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern
by: Tang, Hongyin, et al.
Published: (2024)
by: Tang, Hongyin, et al.
Published: (2024)
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
by: Tang, Beilong, et al.
Published: (2025)
by: Tang, Beilong, et al.
Published: (2025)
Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
by: Chen, Yabo, et al.
Published: (2023)
by: Chen, Yabo, et al.
Published: (2023)
Constrained Auto-Regressive Decoding Constrains Generative Retrieval
by: Wu, Shiguang, et al.
Published: (2025)
by: Wu, Shiguang, et al.
Published: (2025)
Interactive Character Control with Auto-Regressive Motion Diffusion Models
by: Shi, Yi, et al.
Published: (2023)
by: Shi, Yi, et al.
Published: (2023)
What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation
by: Gong, Zhuocheng, et al.
Published: (2024)
by: Gong, Zhuocheng, et al.
Published: (2024)
Turbocharge Speech Understanding with Pilot Inference
by: Wang, Rongxiang, et al.
Published: (2023)
by: Wang, Rongxiang, et al.
Published: (2023)
Similar Items
-
Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers
by: Peng, Xinyu, et al.
Published: (2026) -
Noise Conditional Variational Score Distillation
by: Peng, Xinyu, et al.
Published: (2025) -
FIRE: Flexible Integration of Data Quality Ratings for Effective Pre-Training
by: Xu, Liangyu, et al.
Published: (2025) -
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
by: Xu, Zhihao, et al.
Published: (2026) -
LinkQA: Synthesizing Diverse QA from Multiple Seeds Strongly Linked by Knowledge Points
by: Zhang, Xuemiao, et al.
Published: (2025)