:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Razzaq, Waleed, Zhao, Yun-Bo
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.04421
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning
by: Razzaq, Waleed, et al.
Published: (2026)

CARLE: A Hybrid Deep-Shallow Learning Framework for Robust and Explainable RUL Estimation of Rolling Element Bearings
by: Razzaq, Waleed, et al.
Published: (2025)

A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations
by: Razzaq, Waleed, et al.
Published: (2025)

Developing Distance-Aware, and Evident Uncertainty Quantification in Dynamic Physics-Constrained Neural Networks for Robust Bearing Degradation Estimation
by: Razzaq, Waleed, et al.
Published: (2025)

Neuronal Attention Circuit (NAC) for Representation Learning
by: Razzaq, Waleed, et al.
Published: (2025)

Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers
by: Chen, Yihong, et al.
Published: (2026)

Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting
by: Zhao, Yanjun, et al.
Published: (2024)

FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models
by: Zhu, Max, et al.
Published: (2024)

SinkRouter: Sink-Aware Routing for Efficient Long-Context Decoding in Large Language and Multimodal Models
by: Liu, Junnan, et al.
Published: (2026)

Sparse Adapter Fusion for Continual Learning in NLP
by: Zeng, Min, et al.
Published: (2026)

On the Existence and Behavior of Secondary Attention Sinks
by: Wong, Jeffrey T. H., et al.
Published: (2025)

Attention Sinks and Outliers in Attention Residuals
by: Luo, Haozheng, et al.
Published: (2026)

Evaluating Large Language Models for Security Bug Report Prediction
by: Soltaniani, Farnaz, et al.
Published: (2026)

HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting
by: Wang, Tan, et al.
Published: (2025)

Solving Continual Offline Reinforcement Learning with Decision Transformer
by: Huang, Kaixin, et al.
Published: (2024)

EsaCL: Efficient Continual Learning of Sparse Models
by: Ren, Weijieying, et al.
Published: (2024)

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
by: Liu, Renpu, et al.
Published: (2024)

Group and Exclusive Sparse Regularization-based Continual Learning of CNNs
by: Tousside, Basile, et al.
Published: (2026)

Community-Aware Temporal Walks: Parameter-Free Representation Learning on Continuous-Time Dynamic Graphs
by: Yu, He, et al.
Published: (2025)

ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling
by: Chen, Yuqi, et al.
Published: (2024)

Memorization Sinks: Isolating Memorization during LLM Training
by: Ghosal, Gaurav R., et al.
Published: (2025)

Rough Transformers for Continuous and Efficient Time-Series Modelling
by: Moreno-Pino, Fernando, et al.
Published: (2024)

Source-Free Cross-Domain Continual Learning
by: Furqon, Muhammad Tanzil, et al.
Published: (2025)

GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
by: Xiang, Maoyang, et al.
Published: (2026)

Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention
by: Súkeník, Peter, et al.
Published: (2026)

Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
by: Queipo-de-Llano, Enrique, et al.
Published: (2025)

Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers
by: Zhang, Yukun, et al.
Published: (2025)

SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning
by: Zhao, Dandan, et al.
Published: (2024)

Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation
by: Zhao, Runze, et al.
Published: (2025)

Considering Nonstationary within Multivariate Time Series with Variational Hierarchical Transformer for Forecasting
by: Wang, Muyao, et al.
Published: (2024)

Is Prompt Selection Necessary for Task-Free Online Continual Learning?
by: Park, Seoyoung, et al.
Published: (2026)

DESIRE: Dynamic Knowledge Consolidation for Rehearsal-Free Continual Learning
by: Guo, Haiyang, et al.
Published: (2024)

Sink-Aware Pruning for Diffusion Language Models
by: Myrzakhan, Aidar, et al.
Published: (2026)

Cloning Ideology and Style using Deep Learning
by: Beg, Omer, et al.
Published: (2022)

Rethinking Time Encoding via Learnable Transformation Functions
by: Chen, Xi, et al.
Published: (2025)

Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
by: Chen, Bo, et al.
Published: (2024)

TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations
by: Zhang, Zheng, et al.
Published: (2024)

Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning
by: Zhao, Yuqing, et al.
Published: (2024)

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity
by: Li, Siquan, et al.
Published: (2026)

Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need
by: Peng, Sijia, et al.
Published: (2024)