Saved in:
| Main Authors: | Yang, An, Yu, Bowen, Li, Chengyuan, Liu, Dayiheng, Huang, Fei, Huang, Haoyan, Jiang, Jiandong, Tu, Jianhong, Zhang, Jianwei, Zhou, Jingren, Lin, Junyang, Dang, Kai, Yang, Kexin, Yu, Le, Li, Mei, Sun, Minmin, Zhu, Qin, Men, Rui, He, Tao, Xu, Weijia, Yin, Wenbiao, Yu, Wenyuan, Qiu, Xiafei, Ren, Xingzhang, Yang, Xinlong, Li, Yong, Xu, Zhiying, Zhang, Zipeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.15383 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Qwen2.5 Technical Report
by: Qwen, et al.
Published: (2024)
by: Qwen, et al.
Published: (2024)
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
by: Yang, An, et al.
Published: (2024)
by: Yang, An, et al.
Published: (2024)
Qwen2.5-Coder Technical Report
by: Hui, Binyuan, et al.
Published: (2024)
by: Hui, Binyuan, et al.
Published: (2024)
Qwen3 Technical Report
by: Yang, An, et al.
Published: (2025)
by: Yang, An, et al.
Published: (2025)
Qwen3Guard Technical Report
by: Zhao, Haiquan, et al.
Published: (2025)
by: Zhao, Haiquan, et al.
Published: (2025)
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
by: Zhang, Yanzhao, et al.
Published: (2025)
by: Zhang, Yanzhao, et al.
Published: (2025)
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking
by: Li, Mingxin, et al.
Published: (2026)
by: Li, Mingxin, et al.
Published: (2026)
Qwen2 Technical Report
by: Yang, An, et al.
Published: (2024)
by: Yang, An, et al.
Published: (2024)
Qwen3-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)
by: Bai, Shuai, et al.
Published: (2025)
Qwen3-Omni Technical Report
by: Xu, Jin, et al.
Published: (2025)
by: Xu, Jin, et al.
Published: (2025)
Language Models can Self-Lengthen to Generate Long Texts
by: Quan, Shanghaoran, et al.
Published: (2024)
by: Quan, Shanghaoran, et al.
Published: (2024)
Qwen3-ASR Technical Report
by: Shi, Xian, et al.
Published: (2026)
by: Shi, Xian, et al.
Published: (2026)
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models
by: Deng, Boyi, et al.
Published: (2026)
by: Deng, Boyi, et al.
Published: (2026)
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
by: Qiu, Zihan, et al.
Published: (2025)
by: Qiu, Zihan, et al.
Published: (2025)
Qwen2-Audio Technical Report
by: Chu, Yunfei, et al.
Published: (2024)
by: Chu, Yunfei, et al.
Published: (2024)
Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference
by: Yi, Ke, et al.
Published: (2024)
by: Yi, Ke, et al.
Published: (2024)
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
by: Wang, Peng, et al.
Published: (2024)
by: Wang, Peng, et al.
Published: (2024)
Qwen-Image Technical Report
by: Wu, Chenfei, et al.
Published: (2025)
by: Wu, Chenfei, et al.
Published: (2025)
WorldPM: Scaling Human Preference Modeling
by: Wang, Binghai, et al.
Published: (2025)
by: Wang, Binghai, et al.
Published: (2025)
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
by: Shen, Weizhou, et al.
Published: (2025)
by: Shen, Weizhou, et al.
Published: (2025)
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
by: Qiu, Zihan, et al.
Published: (2025)
by: Qiu, Zihan, et al.
Published: (2025)
Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026)
by: Hu, Hangrui, et al.
Published: (2026)
An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model
by: Tian, Yuxin, et al.
Published: (2024)
by: Tian, Yuxin, et al.
Published: (2024)
DataMan: Data Manager for Pre-training Large Language Models
by: Peng, Ru, et al.
Published: (2025)
by: Peng, Ru, et al.
Published: (2025)
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations
by: Tao, Qian, et al.
Published: (2024)
by: Tao, Qian, et al.
Published: (2024)
A Unified View of Attention and Residual Sinks: Outlier-Driven Rescaling is Essential for Transformer Training
by: Qiu, Zihan, et al.
Published: (2026)
by: Qiu, Zihan, et al.
Published: (2026)
Qwen-Image-2.0 Technical Report
by: Zhao, Bing, et al.
Published: (2026)
by: Zhao, Bing, et al.
Published: (2026)
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
by: Wan, Fanqi, et al.
Published: (2025)
by: Wan, Fanqi, et al.
Published: (2025)
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
by: Yin, Shengming, et al.
Published: (2025)
by: Yin, Shengming, et al.
Published: (2025)
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
by: Tang, Shengkun, et al.
Published: (2026)
by: Tang, Shengkun, et al.
Published: (2026)
QwenStyle: Content-Preserving Style Transfer with Qwen-Image-Edit
by: Zhang, Shiwen, et al.
Published: (2026)
by: Zhang, Shiwen, et al.
Published: (2026)
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
by: Zhou, Yufa, et al.
Published: (2025)
by: Zhou, Yufa, et al.
Published: (2025)
QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization
by: Shen, Weizhou, et al.
Published: (2025)
by: Shen, Weizhou, et al.
Published: (2025)
ProcessBench: Identifying Process Errors in Mathematical Reasoning
by: Zheng, Chujie, et al.
Published: (2024)
by: Zheng, Chujie, et al.
Published: (2024)
The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)
by: Zhang, Zhenru, et al.
Published: (2025)
GraphAr: An Efficient Storage Scheme for Graph Data in Data Lakes
by: Li, Xue, et al.
Published: (2023)
by: Li, Xue, et al.
Published: (2023)
Unicron: Economizing Self-Healing LLM Training at Scale
by: He, Tao, et al.
Published: (2023)
by: He, Tao, et al.
Published: (2023)
Preparation of Polyvinyl Alcohol/Zirconium Hollow Microspheres With Surface Flower‐Like and the Phosphate Adsorption Ability
by: Wenyuan Huang, et al.
Published: (2025)
by: Wenyuan Huang, et al.
Published: (2025)
P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
by: Zhang, Yidan, et al.
Published: (2024)
by: Zhang, Yidan, et al.
Published: (2024)
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
by: Wang, Shaobo, et al.
Published: (2026)
by: Wang, Shaobo, et al.
Published: (2026)
Similar Items
-
Qwen2.5 Technical Report
by: Qwen, et al.
Published: (2024) -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
by: Yang, An, et al.
Published: (2024) -
Qwen2.5-Coder Technical Report
by: Hui, Binyuan, et al.
Published: (2024) -
Qwen3 Technical Report
by: Yang, An, et al.
Published: (2025) -
Qwen3Guard Technical Report
by: Zhao, Haiquan, et al.
Published: (2025)