:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Liu, Ziyang
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2604.18128
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation
by: Jantsch, Lasse Marten, et al.
Published: (2026)

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
by: Lau, Tim Tsz-Kit, et al.
Published: (2026)

AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)

GLU Attention Improve Transformer
by: Wang, Zehao
Published: (2025)

Dependency-Aware Semi-Structured Sparsity of GLU Variants in Large Language Models
by: Guo, Zhiyu, et al.
Published: (2024)

SwiLTra-Bench: The Swiss Legal Translation Benchmark
by: Niklaus, Joel, et al.
Published: (2025)

Reverse-Engineering the Reader
by: Kiegeland, Samuel, et al.
Published: (2024)

Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer
by: Choi, Euntae, et al.
Published: (2025)

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
by: Lin, Yujun, et al.
Published: (2024)

A Decomposition Perspective to Long-context Reasoning for LLMs
by: Xiao, Yanling, et al.
Published: (2026)

Bayesian WeakS-to-Strong from Text Classification to Generation
by: Cui, Ziyun, et al.
Published: (2024)

Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis
by: Kapl, Ferdinand, et al.
Published: (2025)

Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization
by: Chen, Hung-Hsuan
Published: (2026)

Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
by: Kohli, Harsh, et al.
Published: (2026)

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models
by: Sheppert, Alexander
Published: (2026)

The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?
by: Lyu, Xingyu, et al.
Published: (2026)

Multi-Token Prediction Needs Registers
by: Gerontopoulos, Anastasios, et al.
Published: (2025)

Unlocking Continual Learning Abilities in Language Models
by: Du, Wenyu, et al.
Published: (2024)

A Sea of Words: An In-Depth Analysis of Anchors for Text Data
by: Lopardo, Gianluigi, et al.
Published: (2022)

Distill and Align Decomposition for Enhanced Claim Verification
by: Magomere, Jabez, et al.
Published: (2026)

CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding
by: Liu, Yang, et al.
Published: (2024)

Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
by: Lu, Wenquan, et al.
Published: (2025)

IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
by: Gringras, David
Published: (2026)

ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
by: Zhao, Ziyu, et al.
Published: (2026)

Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training
by: Liu, Mingjie, et al.
Published: (2025)

Mode-Conditioning Unlocks Superior Test-Time Scaling
by: Wu, Chen Henry, et al.
Published: (2025)

Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
by: Yang, Hongming, et al.
Published: (2025)

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
by: Liu, Akide, et al.
Published: (2024)

GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
by: Wang, Enguang, et al.
Published: (2024)

Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
by: Shen, Zhanming, et al.
Published: (2026)

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
by: Xiaomi, LLM-Core, et al.
Published: (2025)

Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
by: Deng, Wenhao, et al.
Published: (2025)

Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling
by: Bratulić, Jelena, et al.
Published: (2025)

Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
by: Li, Chengpeng, et al.
Published: (2024)

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
by: Li, Ziniu, et al.
Published: (2025)

Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
by: Gu, Naibin, et al.
Published: (2025)

Sparse Attention Decomposition Applied to Circuit Tracing
by: Franco, Gabriel, et al.
Published: (2024)

Information-Theoretic Reward Decomposition for Generalizable RLHF
by: Mao, Liyuan, et al.
Published: (2025)

PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching
by: Chen, Ruishuo, et al.
Published: (2026)