Saved in:
| Main Authors: | Xiao, Chaojun, Cai, Jie, Zhao, Weilin, Zeng, Guoyang, Lin, Biyuan, Zhou, Jie, Zheng, Zhi, Han, Xu, Liu, Zhiyuan, Sun, Maosong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.04315 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring the Benefit of Activation Sparsity in Pre-training
by: Zhang, Zhengyan, et al.
Published: (2024)
by: Zhang, Zhengyan, et al.
Published: (2024)
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
by: Xiao, Chaojun, et al.
Published: (2024)
by: Xiao, Chaojun, et al.
Published: (2024)
H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
by: Gao, Cheng, et al.
Published: (2025)
by: Gao, Cheng, et al.
Published: (2025)
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
by: Zhao, Weilin, et al.
Published: (2025)
by: Zhao, Weilin, et al.
Published: (2025)
Data Science and Technology Towards AGI Part I: Tiered Data Management
by: Wang, Yudong, et al.
Published: (2026)
by: Wang, Yudong, et al.
Published: (2026)
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
by: Wang, Xing, et al.
Published: (2025)
by: Wang, Xing, et al.
Published: (2025)
NOSA: Native and Offloadable Sparse Attention
by: Huang, Yuxiang, et al.
Published: (2025)
by: Huang, Yuxiang, et al.
Published: (2025)
APB-V: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention
by: Huang, Yuxiang, et al.
Published: (2026)
by: Huang, Yuxiang, et al.
Published: (2026)
MiniCPM4: Ultra-Efficient LLMs on End Devices
by: MiniCPM Team, et al.
Published: (2025)
by: MiniCPM Team, et al.
Published: (2025)
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
by: Ding, Ning, et al.
Published: (2024)
by: Ding, Ning, et al.
Published: (2024)
Configurable Foundation Models: Building LLMs from a Modular Perspective
by: Xiao, Chaojun, et al.
Published: (2024)
by: Xiao, Chaojun, et al.
Published: (2024)
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
by: Huang, Yuxiang, et al.
Published: (2025)
by: Huang, Yuxiang, et al.
Published: (2025)
KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning
by: Gao, Cheng, et al.
Published: (2026)
by: Gao, Cheng, et al.
Published: (2026)
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
by: Song, Chenyang, et al.
Published: (2026)
by: Song, Chenyang, et al.
Published: (2026)
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
by: Zhao, Weilin, et al.
Published: (2025)
by: Zhao, Weilin, et al.
Published: (2025)
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
by: Wang, Yudong, et al.
Published: (2025)
by: Wang, Yudong, et al.
Published: (2025)
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs
by: Zhang, Zhengyan, et al.
Published: (2024)
by: Zhang, Zhengyan, et al.
Published: (2024)
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
by: Song, Chenyang, et al.
Published: (2025)
by: Song, Chenyang, et al.
Published: (2025)
UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models
by: Shi, Qundong, et al.
Published: (2026)
by: Shi, Qundong, et al.
Published: (2026)
Stuffed Mamba: Oversized States Lead to the Inability to Forget
by: Chen, Yingfa, et al.
Published: (2024)
by: Chen, Yingfa, et al.
Published: (2024)
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
by: Luo, Kairong, et al.
Published: (2025)
by: Luo, Kairong, et al.
Published: (2025)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases
by: Zeng, Zheni, et al.
Published: (2024)
by: Zeng, Zheni, et al.
Published: (2024)
StateX: Enhancing RNN Recall via Post-training State Expansion
by: Shen, Xingyu, et al.
Published: (2025)
by: Shen, Xingyu, et al.
Published: (2025)
Empowering Private Tutoring by Chaining Large Language Models
by: Chen, Yulin, et al.
Published: (2023)
by: Chen, Yulin, et al.
Published: (2023)
Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication
by: Chen, Weize, et al.
Published: (2024)
by: Chen, Weize, et al.
Published: (2024)
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
by: Guo, Yiju, et al.
Published: (2024)
by: Guo, Yiju, et al.
Published: (2024)
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
by: Chen, Yingfa, et al.
Published: (2026)
by: Chen, Yingfa, et al.
Published: (2026)
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
by: MiniCPM Team, et al.
Published: (2026)
by: MiniCPM Team, et al.
Published: (2026)
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
by: Chen, Weize, et al.
Published: (2024)
by: Chen, Weize, et al.
Published: (2024)
Personality-affected Emotion Generation in Dialog Systems
by: Wen, Zhiyuan, et al.
Published: (2024)
by: Wen, Zhiyuan, et al.
Published: (2024)
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
by: Zeng, Liang, et al.
Published: (2025)
by: Zeng, Liang, et al.
Published: (2025)
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
by: Zhao, Weilin, et al.
Published: (2024)
by: Zhao, Weilin, et al.
Published: (2024)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
by: Qian, Cheng, et al.
Published: (2024)
by: Qian, Cheng, et al.
Published: (2024)
PersLLM: A Personified Training Approach for Large Language Models
by: Zeng, Zheni, et al.
Published: (2024)
by: Zeng, Zheni, et al.
Published: (2024)
From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
by: Yuan, Lifan, et al.
Published: (2025)
by: Yuan, Lifan, et al.
Published: (2025)
A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences
by: Shen, Jiaxin, et al.
Published: (2025)
by: Shen, Jiaxin, et al.
Published: (2025)
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
by: Chen, Yingfa, et al.
Published: (2025)
by: Chen, Yingfa, et al.
Published: (2025)
ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation
by: Huang, Pengcheng, et al.
Published: (2025)
by: Huang, Pengcheng, et al.
Published: (2025)
Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
by: Li, Houyi, et al.
Published: (2025)
by: Li, Houyi, et al.
Published: (2025)
Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution
by: Qian, Cheng, et al.
Published: (2024)
by: Qian, Cheng, et al.
Published: (2024)
Similar Items
-
Exploring the Benefit of Activation Sparsity in Pre-training
by: Zhang, Zhengyan, et al.
Published: (2024) -
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
by: Xiao, Chaojun, et al.
Published: (2024) -
H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
by: Gao, Cheng, et al.
Published: (2025) -
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
by: Zhao, Weilin, et al.
Published: (2025) -
Data Science and Technology Towards AGI Part I: Tiered Data Management
by: Wang, Yudong, et al.
Published: (2026)