Saved in:
| Main Authors: | Li, Jiacheng, Tan, Jianchao, Yang, Zhidong, Sun, Pingwei, Huo, Feiye, Qin, Jiayu, Zhang, Xiangyu, He, Maoxin, Sun, Yerui, Xie, Yuchen, Tan, Guangming, Jia, Weile, Cai, Xunliang, Zhao, Tong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.16676 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing
by: Li, Jiacheng, et al.
Published: (2025)
by: Li, Jiacheng, et al.
Published: (2025)
SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
by: Xu, Hongtao, et al.
Published: (2026)
by: Xu, Hongtao, et al.
Published: (2026)
C2T: A Classifier-Based Tree Construction Method in Speculative Decoding
by: Huo, Feiye, et al.
Published: (2025)
by: Huo, Feiye, et al.
Published: (2025)
FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control
by: Sun, Pingwei, et al.
Published: (2026)
by: Sun, Pingwei, et al.
Published: (2026)
MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training
by: Li, Jiacheng, et al.
Published: (2026)
by: Li, Jiacheng, et al.
Published: (2026)
AsyncTLS: Efficient Generative LLM Inference with Asynchronous Two-level Sparse Attention
by: Hu, Yuxuan, et al.
Published: (2026)
by: Hu, Yuxuan, et al.
Published: (2026)
Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies
by: Hu, Yuxuan, et al.
Published: (2025)
by: Hu, Yuxuan, et al.
Published: (2025)
Accelerate Speculative Decoding with Sparse Computation in Verification
by: Wang, Jikai, et al.
Published: (2025)
by: Wang, Jikai, et al.
Published: (2025)
Exploring Landscapes for Better Minima along Valleys
by: Zhao, Tong, et al.
Published: (2025)
by: Zhao, Tong, et al.
Published: (2025)
MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures
by: Qin, Jiayu, et al.
Published: (2025)
by: Qin, Jiayu, et al.
Published: (2025)
MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interatomic Potentials
by: Zhou, Yuanchang, et al.
Published: (2026)
by: Zhou, Yuanchang, et al.
Published: (2026)
Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning
by: Wang, Chen, et al.
Published: (2026)
by: Wang, Chen, et al.
Published: (2026)
Fine-tuning vs Prompting, Can Language Models Understand Human Values?
by: Sun, Pingwei
Published: (2024)
by: Sun, Pingwei
Published: (2024)
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
by: Zhou, Yuanchang, et al.
Published: (2024)
by: Zhou, Yuanchang, et al.
Published: (2024)
Deep Learning-Enabled Supercritical Flame Simulation at Detailed Chemistry and Real-Fluid Accuracy Towards Trillion-Cell Scale
by: Guo, Zhuoqiang, et al.
Published: (2025)
by: Guo, Zhuoqiang, et al.
Published: (2025)
JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials
by: Wang, Hongyu, et al.
Published: (2026)
by: Wang, Hongyu, et al.
Published: (2026)
Large Scale Finite-Temperature Real-time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU Systems
by: Liu, Rongrong, et al.
Published: (2025)
by: Liu, Rongrong, et al.
Published: (2025)
Scaling Embeddings Outperforms Scaling Experts in Language Models
by: Liu, Hong, et al.
Published: (2026)
by: Liu, Hong, et al.
Published: (2026)
WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets
by: Banegas-Luna, Antonio Jesús, et al.
Published: (2025)
by: Banegas-Luna, Antonio Jesús, et al.
Published: (2025)
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
by: Li, Qingyuan, et al.
Published: (2024)
by: Li, Qingyuan, et al.
Published: (2024)
Optimization of target film materials and protective coatings for sealed neutron generator
by: Cao, Yingying, et al.
Published: (2025)
by: Cao, Yingying, et al.
Published: (2025)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)
by: Qian, Yulei, et al.
Published: (2024)
Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day
by: Li, Jianxiong, et al.
Published: (2024)
by: Li, Jianxiong, et al.
Published: (2024)
ALKPU: an active learning method for the DeePMD model with Kalman filter
by: Li, Haibo, et al.
Published: (2024)
by: Li, Haibo, et al.
Published: (2024)
Scaling Neural-Network-Based Molecular Dynamics with Long-Range Electrostatic Interactions to 51 Nanoseconds per Day
by: Li, Jianxiong, et al.
Published: (2025)
by: Li, Jianxiong, et al.
Published: (2025)
Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials
by: Zhou, Yuanchang, et al.
Published: (2026)
by: Zhou, Yuanchang, et al.
Published: (2026)
Letter to the editor regarding quality of life following perioperative optimization with nutritional supplements in patients undergoing gastrointestinal surgery for cancer
by: Mengqi Qu, et al.
Published: (2024)
by: Mengqi Qu, et al.
Published: (2024)
Improving DAPO from a Mixed-Policy Perspective
by: Tan, Hongze, et al.
Published: (2025)
by: Tan, Hongze, et al.
Published: (2025)
SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On
by: Wang, Ruida, et al.
Published: (2023)
by: Wang, Ruida, et al.
Published: (2023)
Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images
by: Tan, Jen Hong
Published: (2024)
by: Tan, Jen Hong
Published: (2024)
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
by: Wang, Ao, et al.
Published: (2024)
by: Wang, Ao, et al.
Published: (2024)
Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference
by: Li, Qingyuan, et al.
Published: (2024)
by: Li, Qingyuan, et al.
Published: (2024)
Letter to the Editor: Comment on: “Comparison of perioperative and histopathologic outcomes among neoadjuvant treatment strategies for locoregional gastric cancer”
by: Chunling Li, et al.
Published: (2024)
by: Chunling Li, et al.
Published: (2024)
Multi-Static Target Position Estimation and System Optimization for Cell-Free mMIMO-OTFS ISAC
by: Fan, Yifei, et al.
Published: (2025)
by: Fan, Yifei, et al.
Published: (2025)
DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training
by: Hu, Tianhao, et al.
Published: (2026)
by: Hu, Tianhao, et al.
Published: (2026)
OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation
by: Sun, Chen, et al.
Published: (2026)
by: Sun, Chen, et al.
Published: (2026)
Over‐Lithiation Regulation of Silicon‐Based Anodes for High‐Energy Lithium‐Ion Batteries
by: Xiaohong Wang, et al.
Published: (2024)
by: Xiaohong Wang, et al.
Published: (2024)
Hidden Quantum Advantage near the Decoding Threshold of Decoded Quantum Interferometry
by: Gao, Maoxin, et al.
Published: (2026)
by: Gao, Maoxin, et al.
Published: (2026)
From Data to Decisions: How Machine Learning and Generative Artificial Intelligence Are Redefining Precision Medicine in Kidney Transplantation
by: Maoxin Liao, et al.
Published: (2026)
by: Maoxin Liao, et al.
Published: (2026)
Salivary Duct Carcinoma: Report of an Advanced Challenging Case Diagnosed by Ultrasound‐Guided Fine‐Needle Aspiration Biopsy
by: Prerna Khetan, et al.
Published: (2026)
by: Prerna Khetan, et al.
Published: (2026)
Similar Items
-
AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing
by: Li, Jiacheng, et al.
Published: (2025) -
SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
by: Xu, Hongtao, et al.
Published: (2026) -
C2T: A Classifier-Based Tree Construction Method in Speculative Decoding
by: Huo, Feiye, et al.
Published: (2025) -
FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control
by: Sun, Pingwei, et al.
Published: (2026) -
MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training
by: Li, Jiacheng, et al.
Published: (2026)