:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Hajra, Suvadeep
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.15548
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling
by: Hajra, Suvadeep, et al.
Published: (2026)

Decomposable Transformer Point Processes
by: Panos, Aristeidis
Published: (2024)

Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability
by: Sharma, Shivam, et al.
Published: (2023)

Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting
by: Kang, Bong Gyun, et al.
Published: (2024)

Decomposing Attention To Find Context-Sensitive Neurons
by: Gibson, Alex
Published: (2025)

Short-Range Oversquashing
by: Mishayev, Yaaqov, et al.
Published: (2025)

Hybrid Focal and Full-Range Attention Based Graph Transformers
by: Zhu, Minhong, et al.
Published: (2023)

Decomposing Global Feature Effects Based on Feature Interactions
by: Herbinger, Julia, et al.
Published: (2023)

The Effect of Attention Head Count on Transformer Approximation
by: Yu, Penghao, et al.
Published: (2025)

AI Generalisation Gap In Comorbid Sleep Disorder Staging
by: Bose, Saswata, et al.
Published: (2026)

Triple Attention Transformer Architecture for Time-Dependent Concrete Creep Prediction
by: Dokduea, Warayut, et al.
Published: (2025)

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
by: Lou, Chao, et al.
Published: (2024)

DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
by: Tamano, Shu
Published: (2025)

Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics
by: Zhang, Wenqing, et al.
Published: (2024)

Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls
by: Bai, Xiaoyan, et al.
Published: (2025)

DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning
by: Liao, Huanxuan, et al.
Published: (2025)

Extracting Cause-Effect Pairs from a Sentence with a Dependency-Aware Transformer Model
by: Kabir, Md Ahsanul, et al.
Published: (2025)

Decomposable Neuro Symbolic Regression
by: Morales, Giorgio, et al.
Published: (2025)

Learning Long-Range Dependencies with Temporal Predictive Coding
by: Potter, Tom, et al.
Published: (2026)

Decomposing Gaussians with Unknown Covariance
by: Dharamshi, Ameer, et al.
Published: (2024)

Decomposing Prediction Mechanisms for In-Context Recall
by: Daniels, Sultan, et al.
Published: (2025)

Decomposing The Dark Matter of Sparse Autoencoders
by: Engels, Joshua, et al.
Published: (2024)

Decomposing the Depth Profile of Fine-Tuning
by: Billa, Jayadev
Published: (2026)

Learning Long Range Dependencies on Graphs via Random Walks
by: Chen, Dexiong, et al.
Published: (2024)

On the Effect of Instability on Learning Continuous-Time Linear Control Systems
by: Hafshejani, Reza Sadeghi, et al.
Published: (2024)

Batch Normalization Decomposed
by: Nachum, Ido, et al.
Published: (2024)

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
by: Wen, Kaiyue, et al.
Published: (2024)

On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
by: Wei, Zichao
Published: (2026)

Leveraging Discrete Function Decomposability for Scientific Design
by: Bowden, James C., et al.
Published: (2025)

Decomposing Task Vectors for Refined Model Editing
by: Damirchi, Hamed, et al.
Published: (2025)

DEMAU: Decompose, Explore, Model and Analyse Uncertainties
by: Hoarau, Arthur, et al.
Published: (2024)

ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies
by: Katav, Itay, et al.
Published: (2025)

Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability
by: Kim, Bum Jun, et al.
Published: (2026)

Transformer Reconstructed with Dynamic Value Attention
by: Wang, Xiaowei
Published: (2025)

Cottention: Linear Transformers With Cosine Attention
by: Mongaras, Gabriel, et al.
Published: (2024)

Transformers with Sparse Attention for Granger Causality
by: Mahesh, Riya, et al.
Published: (2024)

Preconditioned Attention: Enhancing Efficiency in Transformers
by: Saratchandran, Hemanth
Published: (2026)

Graph External Attention Enhanced Transformer
by: Liang, Jianqing, et al.
Published: (2024)

Accelerating the Low-Rank Decomposed Models
by: Hajimolahoseini, Habib, et al.
Published: (2024)

Geometric Learning with Positively Decomposable Kernels
by: Da Costa, Nathael, et al.
Published: (2023)