:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dou, Shihan, Zhou, Enyu, Liu, Yan, Gao, Songyang, Zhao, Jun, Shen, Wei, Zhou, Yuhao, Xi, Zhiheng, Wang, Xiao, Fan, Xiaoran, Pu, Shiliang, Zhu, Jiang, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing
Format:	Preprint
Published:	2023
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2312.09979
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
by: Xi, Zhiheng, et al.
Published: (2023)

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
by: Dou, Shihan, et al.
Published: (2024)

Steering LLMs via Scalable Interactive Oversight
by: Zhou, Enyu, et al.
Published: (2026)

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
by: Zhang, Zhihao, et al.
Published: (2025)

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
by: Lin, Jiahang, et al.
Published: (2026)

Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals
by: Zheng, Rui, et al.
Published: (2024)

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees
by: Wang, Yuhui, et al.
Published: (2026)

Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs
by: Zhou, Xin, et al.
Published: (2024)

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
by: Zhou, Enyu, et al.
Published: (2024)

MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
by: Manzoni, Andrea
Published: (2026)

Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation
by: Li, Junzhuo, et al.
Published: (2025)

Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training
by: Jiang, Changhao, et al.
Published: (2025)

FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment for Edge Computing
by: Zhang, Boyang, et al.
Published: (2025)

Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers
by: Chen, Anrui, et al.
Published: (2026)

ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling
by: Yang, Yuchen, et al.
Published: (2026)

Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing
by: Liu, Wentao, et al.
Published: (2025)

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
by: Ye, Junjie, et al.
Published: (2024)

Noise-Robustness Through Noise: A Framework combining Asymmetric LoRA with Poisoning MoE
by: Wang, Zhaokun, et al.
Published: (2025)

EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection
by: Zhou, Yuhao, et al.
Published: (2025)

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
by: Zhang, Yuansen, et al.
Published: (2024)

Secrets of RLHF in Large Language Models Part II: Reward Modeling
by: Wang, Binghai, et al.
Published: (2024)

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
by: Xia, Han, et al.
Published: (2024)

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)

D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
by: Wang, Haodong, et al.
Published: (2025)

MetaRM: Shifted Distributions Alignment via Meta-Learning
by: Dou, Shihan, et al.
Published: (2024)

MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
by: Yang, Shu, et al.
Published: (2024)

Pre-Trained Policy Discriminators are General Reward Models
by: Dou, Shihan, et al.
Published: (2025)

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
by: Wang, Junzhe, et al.
Published: (2026)

LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)

LoRALib: A Standardized Benchmark for Evaluating LoRA-MoE Methods
by: Wang, Shaoheng, et al.
Published: (2025)

Hierarchical LoRA MoE for Efficient CTR Model Scaling
by: Zeng, Zhichen, et al.
Published: (2025)

ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model
by: Xu, Yuhao, et al.
Published: (2026)

ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing
by: Li, Shuo, et al.
Published: (2026)

MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
by: Zhou, Zhuoshan, et al.
Published: (2026)

VRPO: Rethinking Value Modeling for Robust RL Training under Noisy Supervision
by: Zhu, Dingwei, et al.
Published: (2025)

MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)

MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
by: Lin, Jiahang, et al.
Published: (2026)

Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning
by: Prottasha, Nusrat Jahan, et al.
Published: (2026)

Distill Visual Chart Reasoning Ability from LLMs to MLLMs
by: He, Wei, et al.
Published: (2024)