Saved in:
| Main Authors: | Wang, Ya, Zhuo, Zhijian, Zeng, Yutao, Zhou, Xun, Yang, Jian, Li, Xiaoqing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.15499 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024)
by: Zhuo, Zhijian, et al.
Published: (2024)
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
by: Zhuo, Zhijian, et al.
Published: (2025)
by: Zhuo, Zhijian, et al.
Published: (2025)
Efficient Pretraining Length Scaling
by: Wu, Bohong, et al.
Published: (2025)
by: Wu, Bohong, et al.
Published: (2025)
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025)
by: Huang, Hongzhi, et al.
Published: (2025)
Decoupling Safety into Orthogonal Subspace: Cost-Efficient and Performance-Preserving Alignment for Large Language Models
by: Mou, Yutao, et al.
Published: (2025)
by: Mou, Yutao, et al.
Published: (2025)
Scaling Law for Quantization-Aware Training
by: Chen, Mengzhao, et al.
Published: (2025)
by: Chen, Mengzhao, et al.
Published: (2025)
DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task
by: Liu, Wenhan, et al.
Published: (2024)
by: Liu, Wenhan, et al.
Published: (2024)
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
by: Guo, Zhicheng, et al.
Published: (2024)
by: Guo, Zhicheng, et al.
Published: (2024)
Tokenization, Fusion and Decoupling: Bridging the Granularity Mismatch Between Large Language Models and Knowledge Graphs
by: Su, Siyue, et al.
Published: (2026)
by: Su, Siyue, et al.
Published: (2026)
On the Effectiveness of Incremental Training of Large Language Models
by: Li, Miles Q., et al.
Published: (2024)
by: Li, Miles Q., et al.
Published: (2024)
TAIA: Large Language Models are Out-of-Distribution Data Learners
by: Jiang, Shuyang, et al.
Published: (2024)
by: Jiang, Shuyang, et al.
Published: (2024)
Training-Free Long-Context Scaling of Large Language Models
by: An, Chenxin, et al.
Published: (2024)
by: An, Chenxin, et al.
Published: (2024)
Discovering Decoupled Functional Modules in Large Language Models
by: Yu, Yanke, et al.
Published: (2026)
by: Yu, Yanke, et al.
Published: (2026)
pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training
by: Zhang, Wenzheng, et al.
Published: (2026)
by: Zhang, Wenzheng, et al.
Published: (2026)
Mid-Training of Large Language Models: A Survey
by: Mo, Kaixiang, et al.
Published: (2025)
by: Mo, Kaixiang, et al.
Published: (2025)
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
by: Cheng, Xianfu, et al.
Published: (2025)
by: Cheng, Xianfu, et al.
Published: (2025)
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
by: Liang, Xun, et al.
Published: (2025)
by: Liang, Xun, et al.
Published: (2025)
On the (In)Effectiveness of Large Language Models for Chinese Text Correction
by: Li, Yinghui, et al.
Published: (2023)
by: Li, Yinghui, et al.
Published: (2023)
CodeSimpleQA: Scaling Factuality in Code Large Language Models
by: Yang, Jian, et al.
Published: (2025)
by: Yang, Jian, et al.
Published: (2025)
UniCoder: Scaling Code Large Language Model via Universal Code
by: Sun, Tao, et al.
Published: (2024)
by: Sun, Tao, et al.
Published: (2024)
Hybrid Alignment Training for Large Language Models
by: Wang, Chenglong, et al.
Published: (2024)
by: Wang, Chenglong, et al.
Published: (2024)
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
by: Li, Shengrui, et al.
Published: (2026)
by: Li, Shengrui, et al.
Published: (2026)
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
by: Chen, Feilong, et al.
Published: (2025)
by: Chen, Feilong, et al.
Published: (2025)
Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data
by: Li, Rumeng, et al.
Published: (2023)
by: Li, Rumeng, et al.
Published: (2023)
Learning Dynamics in Continual Pre-Training for Large Language Models
by: Wang, Xingjin, et al.
Published: (2025)
by: Wang, Xingjin, et al.
Published: (2025)
Scaling Laws for Post Training Quantized Large Language Models
by: Xu, Zifei, et al.
Published: (2024)
by: Xu, Zifei, et al.
Published: (2024)
Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models
by: Li, Zongqian, et al.
Published: (2026)
by: Li, Zongqian, et al.
Published: (2026)
Self-Improvement Programming for Temporal Knowledge Graph Question Answering
by: Chen, Zhuo, et al.
Published: (2024)
by: Chen, Zhuo, et al.
Published: (2024)
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
by: Shang, Yuying, et al.
Published: (2024)
by: Shang, Yuying, et al.
Published: (2024)
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
by: Que, Haoran, et al.
Published: (2024)
by: Que, Haoran, et al.
Published: (2024)
What Affects the Effective Depth of Large Language Models?
by: Hu, Yi, et al.
Published: (2025)
by: Hu, Yi, et al.
Published: (2025)
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models
by: Zhou, Houquan, et al.
Published: (2024)
by: Zhou, Houquan, et al.
Published: (2024)
XPERT: Expert Knowledge Transfer for Effective Training of Language Models
by: Liu, Chang, et al.
Published: (2026)
by: Liu, Chang, et al.
Published: (2026)
GeneSUM: Large Language Model-based Gene Summary Extraction
by: Chen, Zhijian, et al.
Published: (2024)
by: Chen, Zhijian, et al.
Published: (2024)
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios
by: Ouyang, Zetian, et al.
Published: (2024)
by: Ouyang, Zetian, et al.
Published: (2024)
Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs
by: Liu, Xiaoze, et al.
Published: (2024)
by: Liu, Xiaoze, et al.
Published: (2024)
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training
by: Wang, Zhijun, et al.
Published: (2025)
by: Wang, Zhijun, et al.
Published: (2025)
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
by: Liu, Han, et al.
Published: (2026)
by: Liu, Han, et al.
Published: (2026)
Multi-Bit Distortion-Free Watermarking for Large Language Models
by: Boroujeny, Massieh Kordi, et al.
Published: (2024)
by: Boroujeny, Massieh Kordi, et al.
Published: (2024)
Efficient Attention Mechanisms for Large Language Models: A Survey
by: Sun, Yutao, et al.
Published: (2025)
by: Sun, Yutao, et al.
Published: (2025)
Similar Items
-
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024) -
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
by: Zhuo, Zhijian, et al.
Published: (2025) -
Efficient Pretraining Length Scaling
by: Wu, Bohong, et al.
Published: (2025) -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025) -
Decoupling Safety into Orthogonal Subspace: Cost-Efficient and Performance-Preserving Alignment for Large Language Models
by: Mou, Yutao, et al.
Published: (2025)