:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Bohong, Yan, Shen, Zhang, Sijun, Lu, Jianqiao, Zeng, Yutao, Wang, Ya, Zhou, Xun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2504.14992
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
by: Zhuo, Zhijian, et al.
Published: (2025)

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
by: Wang, Ya, et al.
Published: (2025)

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025)

Parallel Loop Transformer for Efficient Test-Time Computation Scaling
by: Wu, Bohong, et al.
Published: (2025)

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
by: Lai, Xunhao, et al.
Published: (2025)

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024)

HRM-Text: Efficient Pretraining Beyond Scaling
by: Wang, Guan, et al.
Published: (2026)

Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
by: Xiao, Xin, et al.
Published: (2024)

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
by: Ma, Xuezhe, et al.
Published: (2024)

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling
by: Zhang, Zhen, et al.
Published: (2026)

Scaling Law for Quantization-Aware Training
by: Chen, Mengzhao, et al.
Published: (2025)

Universal YOCO for Efficient Depth Scaling
by: Sun, Yutao, et al.
Published: (2026)

Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs
by: Zhong, Shan, et al.
Published: (2024)

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining
by: Dong, Haoyu, et al.
Published: (2025)

Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models
by: Wu, Wei, et al.
Published: (2026)

Frac-Connections: Fractional Extension of Hyper-Connections
by: Zhu, Defa, et al.
Published: (2025)

Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)

Hyper-Connections
by: Zhu, Defa, et al.
Published: (2024)

An Integrated Data Processing Framework for Pretraining Foundation Models
by: Sun, Yiding, et al.
Published: (2024)

PhenoLIP: Integrating Phenotype Ontology Knowledge into Medical Vision-Language Pretraining
by: Liang, Cheng, et al.
Published: (2026)

QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
by: Liu, Fengze, et al.
Published: (2025)

Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
by: Maracani, Andrea, et al.
Published: (2025)

Budget-aware Test-time Scaling via Discriminative Verification
by: Montgomery, Kyle, et al.
Published: (2025)

Decoupling Safety into Orthogonal Subspace: Cost-Efficient and Performance-Preserving Alignment for Large Language Models
by: Mou, Yutao, et al.
Published: (2025)

Language Models and Cycle Consistency for Self-Reflective Machine Translation
by: Wangni, Jianqiao
Published: (2024)

LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models
by: Zhao, Liang, et al.
Published: (2024)

BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence
by: Jin, Jiajie, et al.
Published: (2024)

WRAP++: Web discoveRy Amplified Pretraining
by: Zhou, Jiang, et al.
Published: (2026)

Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs
by: Zhang, Linhao, et al.
Published: (2026)

AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
by: Hua, Kai, et al.
Published: (2025)

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
by: Wu, Xingyu, et al.
Published: (2025)

UNComp: Can Matrix Entropy Uncover Sparsity? -- A Compressor Design from an Uncertainty-Aware Perspective
by: Xiong, Jing, et al.
Published: (2024)

Reformulation for Pretraining Data Augmentation
by: Hao, Xintong, et al.
Published: (2025)

SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization
by: Sun, Yan, et al.
Published: (2026)

Length Generalization of Causal Transformers without Position Encoding
by: Wang, Jie, et al.
Published: (2024)

Flora: Effortless Context Construction to Arbitrary Length and Scale
by: Chen, Tianxiang, et al.
Published: (2025)

Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data
by: Deng, Haoran, et al.
Published: (2025)

WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference
by: Liu, Aiwei, et al.
Published: (2025)

Maximum Score Routing For Mixture-of-Experts
by: Dong, Bowen, et al.
Published: (2025)

DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
by: Yang, Lei, et al.
Published: (2024)