Saved in:
| Main Authors: | Ou, Zhixin, Liang, Peng, Han, Jianchen, Liu, Baihui, Qiao, Linbo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.13198 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science
by: Tian, Kaiyuan, et al.
Published: (2025)
by: Tian, Kaiyuan, et al.
Published: (2025)
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)
by: Liu, Baihui, et al.
Published: (2026)
Budget-aware Auto Optimizer Configurator
by: Liu, Kang, et al.
Published: (2026)
by: Liu, Kang, et al.
Published: (2026)
Exact Dual Geometry of SOC-ICNN Value Functions
by: Liu, Kang, et al.
Published: (2026)
by: Liu, Kang, et al.
Published: (2026)
Dy-mer: An Explainable DNA Sequence Representation Scheme using Dictionary Learning
by: Peng, Zhiyuan, et al.
Published: (2024)
by: Peng, Zhiyuan, et al.
Published: (2024)
EEG-DCNet: A Fast and Accurate MI-EEG Dilated CNN Classification Method
by: Peng, Wei, et al.
Published: (2024)
by: Peng, Wei, et al.
Published: (2024)
DyMRL: Dynamic Multispace Representation Learning for Multimodal Event Forecasting in Knowledge Graph
by: Zhao, Feng, et al.
Published: (2026)
by: Zhao, Feng, et al.
Published: (2026)
DyTTP: Trajectory Prediction with Normalization-Free Transformers
by: Zhu, JianLin, et al.
Published: (2025)
by: Zhu, JianLin, et al.
Published: (2025)
CoDy: Counterfactual Explainers for Dynamic Graphs
by: Qu, Zhan, et al.
Published: (2024)
by: Qu, Zhan, et al.
Published: (2024)
Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
Clip Your Sequences Fairly: Enforcing Length Fairness for Sequence-Level RL
by: Mao, Hanyi, et al.
Published: (2025)
by: Mao, Hanyi, et al.
Published: (2025)
Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies
by: He, Zhouyu, et al.
Published: (2025)
by: He, Zhouyu, et al.
Published: (2025)
Adaptive Ensembles of Fine-Tuned Transformers for LLM-Generated Text Detection
by: Lai, Zhixin, et al.
Published: (2024)
by: Lai, Zhixin, et al.
Published: (2024)
Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents
by: Zhang, Menglong, et al.
Published: (2024)
by: Zhang, Menglong, et al.
Published: (2024)
Scheduling Parallel Optical Circuit Switches for AI Training
by: Liang, Kevin, et al.
Published: (2026)
by: Liang, Kevin, et al.
Published: (2026)
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
by: Cho, Hanseul, et al.
Published: (2024)
by: Cho, Hanseul, et al.
Published: (2024)
Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction
by: Ding, Fei, et al.
Published: (2026)
by: Ding, Fei, et al.
Published: (2026)
Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces
by: Pan, Chaofan, et al.
Published: (2025)
by: Pan, Chaofan, et al.
Published: (2025)
GSPN-2: Efficient Parallel Sequence Modeling
by: Wang, Hongjun, et al.
Published: (2025)
by: Wang, Hongjun, et al.
Published: (2025)
On Vanishing Variance in Transformer Length Generalization
by: Li, Ruining, et al.
Published: (2025)
by: Li, Ruining, et al.
Published: (2025)
Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
by: Liu, Zhihong, et al.
Published: (2024)
by: Liu, Zhihong, et al.
Published: (2024)
In Search of Lost DNA Sequence Pretraining
by: Tang, Zhijiang, et al.
Published: (2026)
by: Tang, Zhijiang, et al.
Published: (2026)
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
by: Tang, Hanlin, et al.
Published: (2024)
by: Tang, Hanlin, et al.
Published: (2024)
Evaluating the Sensitivity of BiLSTM Forecasting Models to Sequence Length and Input Noise
by: Albelali, Salma, et al.
Published: (2025)
by: Albelali, Salma, et al.
Published: (2025)
AGDC: Autoregressive Generation of Variable-Length Sequences with Joint Discrete and Continuous Spaces
by: Shin, Yeonsang, et al.
Published: (2026)
by: Shin, Yeonsang, et al.
Published: (2026)
Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)
by: Hu, Wenjie, et al.
Published: (2025)
How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy
by: Liu, Hanwen, et al.
Published: (2025)
by: Liu, Hanwen, et al.
Published: (2025)
Adaptive Overclocking: Dynamic Control of Thinking Path Length via Real-Time Reasoning Signals
by: Jiang, Shuhao, et al.
Published: (2025)
by: Jiang, Shuhao, et al.
Published: (2025)
DyCE: Dynamically Configurable Exiting for Deep Learning Compression and Real-time Scaling
by: Wang, Qingyuan, et al.
Published: (2024)
by: Wang, Qingyuan, et al.
Published: (2024)
DySCo: Dynamic Semantic Compression for Effective Long-term Time Series Forecasting
by: Ao, Xiang, et al.
Published: (2026)
by: Ao, Xiang, et al.
Published: (2026)
Provable Length Generalization in Sequence Prediction via Spectral Filtering
by: Marsden, Annie, et al.
Published: (2024)
by: Marsden, Annie, et al.
Published: (2024)
GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification
by: Gan, Wangjie, et al.
Published: (2026)
by: Gan, Wangjie, et al.
Published: (2026)
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
by: Zhu, Kaijie, et al.
Published: (2023)
by: Zhu, Kaijie, et al.
Published: (2023)
The Role of Sparsity for Length Generalization in Transformers
by: Golowich, Noah, et al.
Published: (2025)
by: Golowich, Noah, et al.
Published: (2025)
Tequila: Trapping-free Ternary Quantization for Large Language Models
by: Huang, Hong, et al.
Published: (2025)
by: Huang, Hong, et al.
Published: (2025)
USP: A Unified Sequence Parallelism Approach for Long Context Generative AI
by: Fang, Jiarui, et al.
Published: (2024)
by: Fang, Jiarui, et al.
Published: (2024)
On the Limitations and Capabilities of Position Embeddings for Length Generalization
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression
by: Yu, Xiaoming, et al.
Published: (2026)
by: Yu, Xiaoming, et al.
Published: (2026)
Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging
by: Liu, Guisong, et al.
Published: (2026)
by: Liu, Guisong, et al.
Published: (2026)
DyEdgeGAT: Dynamic Edge via Graph Attention for Early Fault Detection in IIoT Systems
by: Zhao, Mengjie, et al.
Published: (2023)
by: Zhao, Mengjie, et al.
Published: (2023)
Similar Items
-
A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science
by: Tian, Kaiyuan, et al.
Published: (2025) -
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026) -
Budget-aware Auto Optimizer Configurator
by: Liu, Kang, et al.
Published: (2026) -
Exact Dual Geometry of SOC-ICNN Value Functions
by: Liu, Kang, et al.
Published: (2026) -
Dy-mer: An Explainable DNA Sequence Representation Scheme using Dictionary Learning
by: Peng, Zhiyuan, et al.
Published: (2024)