Saved in:
| Main Authors: | Yang, Lei, Pan, Leiyu, Xiong, Bojian, Jin, Renren, Zhang, Shaowei, Chen, Yue, Shi, Ling, Zhou, Jiang, Wu, Junru, Wang, Zhen, Peng, Jianxiang, Xiao, Juesi, Dong, Tianyu, Han, Zhuowen, Chen, Zhuo, Ren, Yuqi, Xiong, Deyi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.09205 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DEP: A Decentralized Large Language Model Evaluation Protocol
by: Peng, Jianxiang, et al.
Published: (2026)
by: Peng, Jianxiang, et al.
Published: (2026)
ProBench: Benchmarking Large Language Models in Competitive Programming
by: Yang, Lei, et al.
Published: (2025)
by: Yang, Lei, et al.
Published: (2025)
Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
Do Large Language Models Mirror Cognitive Language Processing?
by: Ren, Yuqi, et al.
Published: (2024)
by: Ren, Yuqi, et al.
Published: (2024)
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
by: Liu, Chuang, et al.
Published: (2024)
by: Liu, Chuang, et al.
Published: (2024)
FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data
by: Sun, Haoran, et al.
Published: (2024)
by: Sun, Haoran, et al.
Published: (2024)
An Empirical Study on the Robustness of Massively Multilingual Neural Machine Translation
by: Supryadi, et al.
Published: (2024)
by: Supryadi, et al.
Published: (2024)
Why Does Reinforcement Learning Generalize? A Feature-Level Mechanistic Study of Post-Training in Large Language Models
by: Shi, Dan, et al.
Published: (2026)
by: Shi, Dan, et al.
Published: (2026)
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
by: Jin, Renren, et al.
Published: (2025)
by: Jin, Renren, et al.
Published: (2025)
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models
by: Liu, Yan, et al.
Published: (2024)
by: Liu, Yan, et al.
Published: (2024)
Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models
by: Wang, Bo, et al.
Published: (2026)
by: Wang, Bo, et al.
Published: (2026)
Evaluating Discourse Cohesion in Pre-trained Language Models
by: He, Jie, et al.
Published: (2025)
by: He, Jie, et al.
Published: (2025)
Multilingual Large Language Models: A Systematic Survey
by: Zhu, Shaolin, et al.
Published: (2024)
by: Zhu, Shaolin, et al.
Published: (2024)
LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation
by: Zhu, Shaolin, et al.
Published: (2024)
by: Zhu, Shaolin, et al.
Published: (2024)
SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models
by: Yang, Lei, et al.
Published: (2026)
by: Yang, Lei, et al.
Published: (2026)
MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting
by: Li, Tianhao, et al.
Published: (2024)
by: Li, Tianhao, et al.
Published: (2024)
Data Mixing for Large Language Models Pretraining: A Survey and Outlook
by: Chen, Zhuo, et al.
Published: (2026)
by: Chen, Zhuo, et al.
Published: (2026)
CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models
by: Shi, Ling, et al.
Published: (2024)
by: Shi, Ling, et al.
Published: (2024)
Continual Pre-training of MoEs: How robust is your router?
by: Thérien, Benjamin, et al.
Published: (2025)
by: Thérien, Benjamin, et al.
Published: (2025)
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
by: Dong, Weilong, et al.
Published: (2024)
by: Dong, Weilong, et al.
Published: (2024)
Large Language Model Safety: A Holistic Survey
by: Shi, Dan, et al.
Published: (2024)
by: Shi, Dan, et al.
Published: (2024)
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
by: Jin, Renren, et al.
Published: (2024)
by: Jin, Renren, et al.
Published: (2024)
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
by: Jin, Can, et al.
Published: (2025)
by: Jin, Can, et al.
Published: (2025)
Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts
by: Pan, Leiyu, et al.
Published: (2025)
by: Pan, Leiyu, et al.
Published: (2025)
FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation
by: Zhu, Shaolin, et al.
Published: (2025)
by: Zhu, Shaolin, et al.
Published: (2025)
PithTrain: A Compact and Agent-Native MoE Training System
by: Lai, Ruihang, et al.
Published: (2026)
by: Lai, Ruihang, et al.
Published: (2026)
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
by: Shi, Dan, et al.
Published: (2024)
by: Shi, Dan, et al.
Published: (2024)
DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping
by: Zhu, Pengyun, et al.
Published: (2026)
by: Zhu, Pengyun, et al.
Published: (2026)
TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation
by: Jin, Renren, et al.
Published: (2025)
by: Jin, Renren, et al.
Published: (2025)
MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
by: Tang, Shengkun, et al.
Published: (2026)
by: Tang, Shengkun, et al.
Published: (2026)
DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
by: Yang, Lei, et al.
Published: (2024)
by: Yang, Lei, et al.
Published: (2024)
Automated Progressive Red Teaming
by: Jiang, Bojian, et al.
Published: (2024)
by: Jiang, Bojian, et al.
Published: (2024)
Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts
by: Wang, Qi, et al.
Published: (2025)
by: Wang, Qi, et al.
Published: (2025)
MoE-Enhanced Multi-Domain Feature Selection and Fusion for Fast Map-Free Trajectory Prediction
by: Xiong, Wenyi, et al.
Published: (2025)
by: Xiong, Wenyi, et al.
Published: (2025)
Sigma-MoE-Tiny Technical Report
by: Hu, Qingguo, et al.
Published: (2025)
by: Hu, Qingguo, et al.
Published: (2025)
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
by: Chen, Junyi, et al.
Published: (2023)
by: Chen, Junyi, et al.
Published: (2023)
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models
by: Kang, Hao, et al.
Published: (2025)
by: Kang, Hao, et al.
Published: (2025)
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
by: Wang, Siqi, et al.
Published: (2024)
by: Wang, Siqi, et al.
Published: (2024)
Similar Items
-
DEP: A Decentralized Large Language Model Evaluation Protocol
by: Peng, Jianxiang, et al.
Published: (2026) -
ProBench: Benchmarking Large Language Models in Competitive Programming
by: Yang, Lei, et al.
Published: (2025) -
Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
by: Li, Bo, et al.
Published: (2026) -
Do Large Language Models Mirror Cognitive Language Processing?
by: Ren, Yuqi, et al.
Published: (2024) -
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
by: Liu, Chuang, et al.
Published: (2024)