Saved in:
| Main Authors: | Dou, Shihan, Zhou, Enyu, Liu, Yan, Gao, Songyang, Zhao, Jun, Shen, Wei, Zhou, Yuhao, Xi, Zhiheng, Wang, Xiao, Fan, Xiaoran, Pu, Shiliang, Zhu, Jiang, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.09979 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
by: Xi, Zhiheng, et al.
Published: (2023)
by: Xi, Zhiheng, et al.
Published: (2023)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
by: Dou, Shihan, et al.
Published: (2024)
by: Dou, Shihan, et al.
Published: (2024)
Steering LLMs via Scalable Interactive Oversight
by: Zhou, Enyu, et al.
Published: (2026)
by: Zhou, Enyu, et al.
Published: (2026)
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
by: Zhang, Zhihao, et al.
Published: (2025)
by: Zhang, Zhihao, et al.
Published: (2025)
MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
by: Lin, Jiahang, et al.
Published: (2026)
by: Lin, Jiahang, et al.
Published: (2026)
Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals
by: Zheng, Rui, et al.
Published: (2024)
by: Zheng, Rui, et al.
Published: (2024)
JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees
by: Wang, Yuhui, et al.
Published: (2026)
by: Wang, Yuhui, et al.
Published: (2026)
Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs
by: Zhou, Xin, et al.
Published: (2024)
by: Zhou, Xin, et al.
Published: (2024)
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
by: Zhou, Enyu, et al.
Published: (2024)
by: Zhou, Enyu, et al.
Published: (2024)
MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
by: Manzoni, Andrea
Published: (2026)
by: Manzoni, Andrea
Published: (2026)
Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation
by: Li, Junzhuo, et al.
Published: (2025)
by: Li, Junzhuo, et al.
Published: (2025)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training
by: Jiang, Changhao, et al.
Published: (2025)
by: Jiang, Changhao, et al.
Published: (2025)
FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment for Edge Computing
by: Zhang, Boyang, et al.
Published: (2025)
by: Zhang, Boyang, et al.
Published: (2025)
Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers
by: Chen, Anrui, et al.
Published: (2026)
by: Chen, Anrui, et al.
Published: (2026)
ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling
by: Yang, Yuchen, et al.
Published: (2026)
by: Yang, Yuchen, et al.
Published: (2026)
Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing
by: Liu, Wentao, et al.
Published: (2025)
by: Liu, Wentao, et al.
Published: (2025)
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
by: Ye, Junjie, et al.
Published: (2024)
by: Ye, Junjie, et al.
Published: (2024)
Noise-Robustness Through Noise: A Framework combining Asymmetric LoRA with Poisoning MoE
by: Wang, Zhaokun, et al.
Published: (2025)
by: Wang, Zhaokun, et al.
Published: (2025)
EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection
by: Zhou, Yuhao, et al.
Published: (2025)
by: Zhou, Yuhao, et al.
Published: (2025)
RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
by: Zhang, Yuansen, et al.
Published: (2024)
by: Zhang, Yuansen, et al.
Published: (2024)
Secrets of RLHF in Large Language Models Part II: Reward Modeling
by: Wang, Binghai, et al.
Published: (2024)
by: Wang, Binghai, et al.
Published: (2024)
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
by: Xia, Han, et al.
Published: (2024)
by: Xia, Han, et al.
Published: (2024)
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
by: Wang, Haodong, et al.
Published: (2025)
by: Wang, Haodong, et al.
Published: (2025)
MetaRM: Shifted Distributions Alignment via Meta-Learning
by: Dou, Shihan, et al.
Published: (2024)
by: Dou, Shihan, et al.
Published: (2024)
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
by: Yang, Shu, et al.
Published: (2024)
by: Yang, Shu, et al.
Published: (2024)
Pre-Trained Policy Discriminators are General Reward Models
by: Dou, Shihan, et al.
Published: (2025)
by: Dou, Shihan, et al.
Published: (2025)
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
by: Wang, Junzhe, et al.
Published: (2026)
by: Wang, Junzhe, et al.
Published: (2026)
LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)
by: Zhu, Fengqi, et al.
Published: (2025)
LoRALib: A Standardized Benchmark for Evaluating LoRA-MoE Methods
by: Wang, Shaoheng, et al.
Published: (2025)
by: Wang, Shaoheng, et al.
Published: (2025)
Hierarchical LoRA MoE for Efficient CTR Model Scaling
by: Zeng, Zhichen, et al.
Published: (2025)
by: Zeng, Zhichen, et al.
Published: (2025)
ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model
by: Xu, Yuhao, et al.
Published: (2026)
by: Xu, Yuhao, et al.
Published: (2026)
ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing
by: Li, Shuo, et al.
Published: (2026)
by: Li, Shuo, et al.
Published: (2026)
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
by: Zhou, Zhuoshan, et al.
Published: (2026)
by: Zhou, Zhuoshan, et al.
Published: (2026)
VRPO: Rethinking Value Modeling for Robust RL Training under Noisy Supervision
by: Zhu, Dingwei, et al.
Published: (2025)
by: Zhu, Dingwei, et al.
Published: (2025)
MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)
by: Su, Zhaoyuan, et al.
Published: (2026)
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)
by: Hannah, Lauren. A, et al.
Published: (2025)
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
by: Lin, Jiahang, et al.
Published: (2026)
by: Lin, Jiahang, et al.
Published: (2026)
Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning
by: Prottasha, Nusrat Jahan, et al.
Published: (2026)
by: Prottasha, Nusrat Jahan, et al.
Published: (2026)
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
by: He, Wei, et al.
Published: (2024)
by: He, Wei, et al.
Published: (2024)
Similar Items
-
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
by: Xi, Zhiheng, et al.
Published: (2023) -
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
by: Dou, Shihan, et al.
Published: (2024) -
Steering LLMs via Scalable Interactive Oversight
by: Zhou, Enyu, et al.
Published: (2026) -
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
by: Zhang, Zhihao, et al.
Published: (2025) -
MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
by: Lin, Jiahang, et al.
Published: (2026)