Saved in:
| Main Authors: | Razzaq, Waleed, Zhao, Yun-Bo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.04421 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning
by: Razzaq, Waleed, et al.
Published: (2026)
by: Razzaq, Waleed, et al.
Published: (2026)
CARLE: A Hybrid Deep-Shallow Learning Framework for Robust and Explainable RUL Estimation of Rolling Element Bearings
by: Razzaq, Waleed, et al.
Published: (2025)
by: Razzaq, Waleed, et al.
Published: (2025)
A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations
by: Razzaq, Waleed, et al.
Published: (2025)
by: Razzaq, Waleed, et al.
Published: (2025)
Developing Distance-Aware, and Evident Uncertainty Quantification in Dynamic Physics-Constrained Neural Networks for Robust Bearing Degradation Estimation
by: Razzaq, Waleed, et al.
Published: (2025)
by: Razzaq, Waleed, et al.
Published: (2025)
Neuronal Attention Circuit (NAC) for Representation Learning
by: Razzaq, Waleed, et al.
Published: (2025)
by: Razzaq, Waleed, et al.
Published: (2025)
Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers
by: Chen, Yihong, et al.
Published: (2026)
by: Chen, Yihong, et al.
Published: (2026)
Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting
by: Zhao, Yanjun, et al.
Published: (2024)
by: Zhao, Yanjun, et al.
Published: (2024)
FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models
by: Zhu, Max, et al.
Published: (2024)
by: Zhu, Max, et al.
Published: (2024)
SinkRouter: Sink-Aware Routing for Efficient Long-Context Decoding in Large Language and Multimodal Models
by: Liu, Junnan, et al.
Published: (2026)
by: Liu, Junnan, et al.
Published: (2026)
Sparse Adapter Fusion for Continual Learning in NLP
by: Zeng, Min, et al.
Published: (2026)
by: Zeng, Min, et al.
Published: (2026)
On the Existence and Behavior of Secondary Attention Sinks
by: Wong, Jeffrey T. H., et al.
Published: (2025)
by: Wong, Jeffrey T. H., et al.
Published: (2025)
Attention Sinks and Outliers in Attention Residuals
by: Luo, Haozheng, et al.
Published: (2026)
by: Luo, Haozheng, et al.
Published: (2026)
Evaluating Large Language Models for Security Bug Report Prediction
by: Soltaniani, Farnaz, et al.
Published: (2026)
by: Soltaniani, Farnaz, et al.
Published: (2026)
HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting
by: Wang, Tan, et al.
Published: (2025)
by: Wang, Tan, et al.
Published: (2025)
Solving Continual Offline Reinforcement Learning with Decision Transformer
by: Huang, Kaixin, et al.
Published: (2024)
by: Huang, Kaixin, et al.
Published: (2024)
EsaCL: Efficient Continual Learning of Sparse Models
by: Ren, Weijieying, et al.
Published: (2024)
by: Ren, Weijieying, et al.
Published: (2024)
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
by: Liu, Renpu, et al.
Published: (2024)
by: Liu, Renpu, et al.
Published: (2024)
Group and Exclusive Sparse Regularization-based Continual Learning of CNNs
by: Tousside, Basile, et al.
Published: (2026)
by: Tousside, Basile, et al.
Published: (2026)
Community-Aware Temporal Walks: Parameter-Free Representation Learning on Continuous-Time Dynamic Graphs
by: Yu, He, et al.
Published: (2025)
by: Yu, He, et al.
Published: (2025)
ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling
by: Chen, Yuqi, et al.
Published: (2024)
by: Chen, Yuqi, et al.
Published: (2024)
Memorization Sinks: Isolating Memorization during LLM Training
by: Ghosal, Gaurav R., et al.
Published: (2025)
by: Ghosal, Gaurav R., et al.
Published: (2025)
Rough Transformers for Continuous and Efficient Time-Series Modelling
by: Moreno-Pino, Fernando, et al.
Published: (2024)
by: Moreno-Pino, Fernando, et al.
Published: (2024)
Source-Free Cross-Domain Continual Learning
by: Furqon, Muhammad Tanzil, et al.
Published: (2025)
by: Furqon, Muhammad Tanzil, et al.
Published: (2025)
GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
by: Xiang, Maoyang, et al.
Published: (2026)
by: Xiang, Maoyang, et al.
Published: (2026)
Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention
by: Súkeník, Peter, et al.
Published: (2026)
by: Súkeník, Peter, et al.
Published: (2026)
Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
by: Queipo-de-Llano, Enrique, et al.
Published: (2025)
by: Queipo-de-Llano, Enrique, et al.
Published: (2025)
Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers
by: Zhang, Yukun, et al.
Published: (2025)
by: Zhang, Yukun, et al.
Published: (2025)
SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning
by: Zhao, Dandan, et al.
Published: (2024)
by: Zhao, Dandan, et al.
Published: (2024)
Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation
by: Zhao, Runze, et al.
Published: (2025)
by: Zhao, Runze, et al.
Published: (2025)
Considering Nonstationary within Multivariate Time Series with Variational Hierarchical Transformer for Forecasting
by: Wang, Muyao, et al.
Published: (2024)
by: Wang, Muyao, et al.
Published: (2024)
Is Prompt Selection Necessary for Task-Free Online Continual Learning?
by: Park, Seoyoung, et al.
Published: (2026)
by: Park, Seoyoung, et al.
Published: (2026)
DESIRE: Dynamic Knowledge Consolidation for Rehearsal-Free Continual Learning
by: Guo, Haiyang, et al.
Published: (2024)
by: Guo, Haiyang, et al.
Published: (2024)
Sink-Aware Pruning for Diffusion Language Models
by: Myrzakhan, Aidar, et al.
Published: (2026)
by: Myrzakhan, Aidar, et al.
Published: (2026)
Cloning Ideology and Style using Deep Learning
by: Beg, Omer, et al.
Published: (2022)
by: Beg, Omer, et al.
Published: (2022)
Rethinking Time Encoding via Learnable Transformation Functions
by: Chen, Xi, et al.
Published: (2025)
by: Chen, Xi, et al.
Published: (2025)
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
by: Chen, Bo, et al.
Published: (2024)
by: Chen, Bo, et al.
Published: (2024)
TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations
by: Zhang, Zheng, et al.
Published: (2024)
by: Zhang, Zheng, et al.
Published: (2024)
Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning
by: Zhao, Yuqing, et al.
Published: (2024)
by: Zhao, Yuqing, et al.
Published: (2024)
The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity
by: Li, Siquan, et al.
Published: (2026)
by: Li, Siquan, et al.
Published: (2026)
Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need
by: Peng, Sijia, et al.
Published: (2024)
by: Peng, Sijia, et al.
Published: (2024)
Similar Items
-
Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning
by: Razzaq, Waleed, et al.
Published: (2026) -
CARLE: A Hybrid Deep-Shallow Learning Framework for Robust and Explainable RUL Estimation of Rolling Element Bearings
by: Razzaq, Waleed, et al.
Published: (2025) -
A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations
by: Razzaq, Waleed, et al.
Published: (2025) -
Developing Distance-Aware, and Evident Uncertainty Quantification in Dynamic Physics-Constrained Neural Networks for Robust Bearing Degradation Estimation
by: Razzaq, Waleed, et al.
Published: (2025) -
Neuronal Attention Circuit (NAC) for Representation Learning
by: Razzaq, Waleed, et al.
Published: (2025)