Saved in:
| Main Authors: | Chen, Hanting, Zhu, Chong, Han, Kai, Tian, Yuchuan, Liang, Yuchen, Guo, Tianyu, Chen, Xinghao, Tao, Dacheng, Wang, Yunhe |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.03377 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs
by: Tian, Yuchuan, et al.
Published: (2025)
by: Tian, Yuchuan, et al.
Published: (2025)
DiJiang: Efficient Large Language Models through Compact Kernelization
by: Chen, Hanting, et al.
Published: (2024)
by: Chen, Hanting, et al.
Published: (2024)
Deferred Commitment Decoding for Diffusion Language Models
by: Shu, Yingte, et al.
Published: (2026)
by: Shu, Yingte, et al.
Published: (2026)
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
by: Guo, Jialong, et al.
Published: (2024)
by: Guo, Jialong, et al.
Published: (2024)
Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing
by: Rang, Miao, et al.
Published: (2026)
by: Rang, Miao, et al.
Published: (2026)
U-REPA: Aligning Diffusion U-Nets to ViTs
by: Tian, Yuchuan, et al.
Published: (2025)
by: Tian, Yuchuan, et al.
Published: (2025)
Multiscale Positive-Unlabeled Detection of AI-Generated Texts
by: Tian, Yuchuan, et al.
Published: (2023)
by: Tian, Yuchuan, et al.
Published: (2023)
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
by: Nie, Ying, et al.
Published: (2025)
by: Nie, Ying, et al.
Published: (2025)
A Survey on Transformer Compression
by: Tang, Yehui, et al.
Published: (2024)
by: Tang, Yehui, et al.
Published: (2024)
DLLM Agent: See Farther, Run Faster
by: Zhen, Huiling, et al.
Published: (2026)
by: Zhen, Huiling, et al.
Published: (2026)
U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers
by: Tian, Yuchuan, et al.
Published: (2024)
by: Tian, Yuchuan, et al.
Published: (2024)
Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants
by: Wang, Yunhe, et al.
Published: (2026)
by: Wang, Yunhe, et al.
Published: (2026)
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation
by: Rang, Miao, et al.
Published: (2025)
by: Rang, Miao, et al.
Published: (2025)
DiC: Rethinking Conv3x3 Designs in Diffusion Models
by: Tian, Yuchuan, et al.
Published: (2024)
by: Tian, Yuchuan, et al.
Published: (2024)
Transferable text data distillation by trajectory matching
by: Yao, Rong, et al.
Published: (2025)
by: Yao, Rong, et al.
Published: (2025)
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
by: Chen, Hanting, et al.
Published: (2025)
by: Chen, Hanting, et al.
Published: (2025)
Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation
by: Tian, Yuchuan, et al.
Published: (2024)
by: Tian, Yuchuan, et al.
Published: (2024)
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
by: Tang, Yehui, et al.
Published: (2025)
by: Tang, Yehui, et al.
Published: (2025)
C-MOP: Integrating Momentum and Boundary-Aware Clustering for Enhanced Prompt Evolution
by: Yan, Binwei, et al.
Published: (2026)
by: Yan, Binwei, et al.
Published: (2026)
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation
by: Wang, Yunhe, et al.
Published: (2023)
by: Wang, Yunhe, et al.
Published: (2023)
Reason-KE++: Aligning the Process, Not Just the Outcome, for Faithful LLM Knowledge Editing
by: Wu, Yuchen, et al.
Published: (2025)
by: Wu, Yuchen, et al.
Published: (2025)
Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA
by: Wu, Yuchen, et al.
Published: (2025)
by: Wu, Yuchen, et al.
Published: (2025)
MoRAgent: Parameter Efficient Agent Tuning with Mixture-of-Roles
by: Han, Jing, et al.
Published: (2025)
by: Han, Jing, et al.
Published: (2025)
Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs
by: Wu, Yuchen, et al.
Published: (2025)
by: Wu, Yuchen, et al.
Published: (2025)
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
by: He, Wei, et al.
Published: (2024)
by: He, Wei, et al.
Published: (2024)
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs
by: Chen, Hanting, et al.
Published: (2025)
by: Chen, Hanting, et al.
Published: (2025)
Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models
by: Tu, Zhijun, et al.
Published: (2025)
by: Tu, Zhijun, et al.
Published: (2025)
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models
by: Nie, Ying, et al.
Published: (2024)
by: Nie, Ying, et al.
Published: (2024)
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
by: Guo, Jianyuan, et al.
Published: (2024)
by: Guo, Jianyuan, et al.
Published: (2024)
Learning Quantized Adaptive Conditions for Diffusion Models
by: Liang, Yuchen, et al.
Published: (2024)
by: Liang, Yuchen, et al.
Published: (2024)
PanGu-$π$ Pro:Rethinking Optimization and Architecture for Tiny Language Models
by: Tang, Yehui, et al.
Published: (2024)
by: Tang, Yehui, et al.
Published: (2024)
IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions
by: Tu, Zhijun, et al.
Published: (2024)
by: Tu, Zhijun, et al.
Published: (2024)
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
by: Chen, Qian, et al.
Published: (2024)
by: Chen, Qian, et al.
Published: (2024)
ROOT: Robust Orthogonalized Optimizer for Neural Network Training
by: He, Wei, et al.
Published: (2025)
by: He, Wei, et al.
Published: (2025)
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation
by: Cai, Shizhan, et al.
Published: (2025)
by: Cai, Shizhan, et al.
Published: (2025)
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models
by: Ho, Zheng Yi, et al.
Published: (2024)
by: Ho, Zheng Yi, et al.
Published: (2024)
Multi-Granularity Semantic Revision for Large Language Model Distillation
by: Liu, Xiaoyu, et al.
Published: (2024)
by: Liu, Xiaoyu, et al.
Published: (2024)
A Survey on Self-Evolution of Large Language Models
by: Tao, Zhengwei, et al.
Published: (2024)
by: Tao, Zhengwei, et al.
Published: (2024)
Prism: Spectral-Aware Block-Sparse Attention
by: Wang, Xinghao, et al.
Published: (2026)
by: Wang, Xinghao, et al.
Published: (2026)
Similar Items
-
From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs
by: Tian, Yuchuan, et al.
Published: (2025) -
DiJiang: Efficient Large Language Models through Compact Kernelization
by: Chen, Hanting, et al.
Published: (2024) -
Deferred Commitment Decoding for Diffusion Language Models
by: Shu, Yingte, et al.
Published: (2026) -
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
by: Guo, Jialong, et al.
Published: (2024) -
Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression
by: Wang, Haoyu, et al.
Published: (2025)