:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiao, Chaojun, Cai, Jie, Zhao, Weilin, Zeng, Guoyang, Lin, Biyuan, Zhou, Jie, Zheng, Zhi, Han, Xu, Liu, Zhiyuan, Sun, Maosong
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2412.04315
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring the Benefit of Activation Sparsity in Pre-training
by: Zhang, Zhengyan, et al.
Published: (2024)

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
by: Xiao, Chaojun, et al.
Published: (2024)

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
by: Gao, Cheng, et al.
Published: (2025)

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
by: Zhao, Weilin, et al.
Published: (2025)

Data Science and Technology Towards AGI Part I: Tiered Data Management
by: Wang, Yudong, et al.
Published: (2026)

Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
by: Wang, Xing, et al.
Published: (2025)

NOSA: Native and Offloadable Sparse Attention
by: Huang, Yuxiang, et al.
Published: (2025)

APB-V: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention
by: Huang, Yuxiang, et al.
Published: (2026)

MiniCPM4: Ultra-Efficient LLMs on End Devices
by: MiniCPM Team, et al.
Published: (2025)

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
by: Ding, Ning, et al.
Published: (2024)

Configurable Foundation Models: Building LLMs from a Modular Perspective
by: Xiao, Chaojun, et al.
Published: (2024)

APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
by: Huang, Yuxiang, et al.
Published: (2025)

KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning
by: Gao, Cheng, et al.
Published: (2026)

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
by: Song, Chenyang, et al.
Published: (2026)

FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
by: Zhao, Weilin, et al.
Published: (2025)

Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
by: Wang, Yudong, et al.
Published: (2025)

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs
by: Zhang, Zhengyan, et al.
Published: (2024)

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
by: Song, Chenyang, et al.
Published: (2025)

UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models
by: Shi, Qundong, et al.
Published: (2026)

Stuffed Mamba: Oversized States Lead to the Inability to Forget
by: Chen, Yingfa, et al.
Published: (2024)

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
by: Luo, Kairong, et al.
Published: (2025)

KBAlign: Efficient Self Adaptation on Specific Knowledge Bases
by: Zeng, Zheni, et al.
Published: (2024)

StateX: Enhancing RNN Recall via Post-training State Expansion
by: Shen, Xingyu, et al.
Published: (2025)

Empowering Private Tutoring by Chaining Large Language Models
by: Chen, Yulin, et al.
Published: (2023)

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication
by: Chen, Weize, et al.
Published: (2024)

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
by: Guo, Yiju, et al.
Published: (2024)

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
by: Chen, Yingfa, et al.
Published: (2026)

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
by: MiniCPM Team, et al.
Published: (2026)

Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
by: Chen, Weize, et al.
Published: (2024)

Personality-affected Emotion Generation in Dialog Systems
by: Wen, Zhiyuan, et al.
Published: (2024)

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
by: Zeng, Liang, et al.
Published: (2025)

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
by: Zhao, Weilin, et al.
Published: (2024)

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
by: Qian, Cheng, et al.
Published: (2024)

PersLLM: A Personified Training Approach for Large Language Models
by: Zeng, Zheni, et al.
Published: (2024)

From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
by: Yuan, Lifan, et al.
Published: (2025)

A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences
by: Shen, Jiaxin, et al.
Published: (2025)

Cost-Optimal Grouped-Query Attention for Long-Context Modeling
by: Chen, Yingfa, et al.
Published: (2025)

ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation
by: Huang, Pengcheng, et al.
Published: (2025)

Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
by: Li, Houyi, et al.
Published: (2025)

Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution
by: Qian, Cheng, et al.
Published: (2024)