Saved in:
| Main Author: | Hajra, Suvadeep |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.15548 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling
by: Hajra, Suvadeep, et al.
Published: (2026)
by: Hajra, Suvadeep, et al.
Published: (2026)
Decomposable Transformer Point Processes
by: Panos, Aristeidis
Published: (2024)
by: Panos, Aristeidis
Published: (2024)
Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability
by: Sharma, Shivam, et al.
Published: (2023)
by: Sharma, Shivam, et al.
Published: (2023)
Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting
by: Kang, Bong Gyun, et al.
Published: (2024)
by: Kang, Bong Gyun, et al.
Published: (2024)
Decomposing Attention To Find Context-Sensitive Neurons
by: Gibson, Alex
Published: (2025)
by: Gibson, Alex
Published: (2025)
Short-Range Oversquashing
by: Mishayev, Yaaqov, et al.
Published: (2025)
by: Mishayev, Yaaqov, et al.
Published: (2025)
Hybrid Focal and Full-Range Attention Based Graph Transformers
by: Zhu, Minhong, et al.
Published: (2023)
by: Zhu, Minhong, et al.
Published: (2023)
Decomposing Global Feature Effects Based on Feature Interactions
by: Herbinger, Julia, et al.
Published: (2023)
by: Herbinger, Julia, et al.
Published: (2023)
The Effect of Attention Head Count on Transformer Approximation
by: Yu, Penghao, et al.
Published: (2025)
by: Yu, Penghao, et al.
Published: (2025)
AI Generalisation Gap In Comorbid Sleep Disorder Staging
by: Bose, Saswata, et al.
Published: (2026)
by: Bose, Saswata, et al.
Published: (2026)
Triple Attention Transformer Architecture for Time-Dependent Concrete Creep Prediction
by: Dokduea, Warayut, et al.
Published: (2025)
by: Dokduea, Warayut, et al.
Published: (2025)
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
by: Lou, Chao, et al.
Published: (2024)
by: Lou, Chao, et al.
Published: (2024)
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
by: Tamano, Shu
Published: (2025)
by: Tamano, Shu
Published: (2025)
Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics
by: Zhang, Wenqing, et al.
Published: (2024)
by: Zhang, Wenqing, et al.
Published: (2024)
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls
by: Bai, Xiaoyan, et al.
Published: (2025)
by: Bai, Xiaoyan, et al.
Published: (2025)
DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning
by: Liao, Huanxuan, et al.
Published: (2025)
by: Liao, Huanxuan, et al.
Published: (2025)
Extracting Cause-Effect Pairs from a Sentence with a Dependency-Aware Transformer Model
by: Kabir, Md Ahsanul, et al.
Published: (2025)
by: Kabir, Md Ahsanul, et al.
Published: (2025)
Decomposable Neuro Symbolic Regression
by: Morales, Giorgio, et al.
Published: (2025)
by: Morales, Giorgio, et al.
Published: (2025)
Learning Long-Range Dependencies with Temporal Predictive Coding
by: Potter, Tom, et al.
Published: (2026)
by: Potter, Tom, et al.
Published: (2026)
Decomposing Gaussians with Unknown Covariance
by: Dharamshi, Ameer, et al.
Published: (2024)
by: Dharamshi, Ameer, et al.
Published: (2024)
Decomposing Prediction Mechanisms for In-Context Recall
by: Daniels, Sultan, et al.
Published: (2025)
by: Daniels, Sultan, et al.
Published: (2025)
Decomposing The Dark Matter of Sparse Autoencoders
by: Engels, Joshua, et al.
Published: (2024)
by: Engels, Joshua, et al.
Published: (2024)
Decomposing the Depth Profile of Fine-Tuning
by: Billa, Jayadev
Published: (2026)
by: Billa, Jayadev
Published: (2026)
Learning Long Range Dependencies on Graphs via Random Walks
by: Chen, Dexiong, et al.
Published: (2024)
by: Chen, Dexiong, et al.
Published: (2024)
On the Effect of Instability on Learning Continuous-Time Linear Control Systems
by: Hafshejani, Reza Sadeghi, et al.
Published: (2024)
by: Hafshejani, Reza Sadeghi, et al.
Published: (2024)
Batch Normalization Decomposed
by: Nachum, Ido, et al.
Published: (2024)
by: Nachum, Ido, et al.
Published: (2024)
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
by: Wen, Kaiyue, et al.
Published: (2024)
by: Wen, Kaiyue, et al.
Published: (2024)
On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
by: Wei, Zichao
Published: (2026)
by: Wei, Zichao
Published: (2026)
Leveraging Discrete Function Decomposability for Scientific Design
by: Bowden, James C., et al.
Published: (2025)
by: Bowden, James C., et al.
Published: (2025)
Decomposing Task Vectors for Refined Model Editing
by: Damirchi, Hamed, et al.
Published: (2025)
by: Damirchi, Hamed, et al.
Published: (2025)
DEMAU: Decompose, Explore, Model and Analyse Uncertainties
by: Hoarau, Arthur, et al.
Published: (2024)
by: Hoarau, Arthur, et al.
Published: (2024)
ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies
by: Katav, Itay, et al.
Published: (2025)
by: Katav, Itay, et al.
Published: (2025)
Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability
by: Kim, Bum Jun, et al.
Published: (2026)
by: Kim, Bum Jun, et al.
Published: (2026)
Transformer Reconstructed with Dynamic Value Attention
by: Wang, Xiaowei
Published: (2025)
by: Wang, Xiaowei
Published: (2025)
Cottention: Linear Transformers With Cosine Attention
by: Mongaras, Gabriel, et al.
Published: (2024)
by: Mongaras, Gabriel, et al.
Published: (2024)
Transformers with Sparse Attention for Granger Causality
by: Mahesh, Riya, et al.
Published: (2024)
by: Mahesh, Riya, et al.
Published: (2024)
Preconditioned Attention: Enhancing Efficiency in Transformers
by: Saratchandran, Hemanth
Published: (2026)
by: Saratchandran, Hemanth
Published: (2026)
Graph External Attention Enhanced Transformer
by: Liang, Jianqing, et al.
Published: (2024)
by: Liang, Jianqing, et al.
Published: (2024)
Accelerating the Low-Rank Decomposed Models
by: Hajimolahoseini, Habib, et al.
Published: (2024)
by: Hajimolahoseini, Habib, et al.
Published: (2024)
Geometric Learning with Positively Decomposable Kernels
by: Da Costa, Nathael, et al.
Published: (2023)
by: Da Costa, Nathael, et al.
Published: (2023)
Similar Items
-
Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling
by: Hajra, Suvadeep, et al.
Published: (2026) -
Decomposable Transformer Point Processes
by: Panos, Aristeidis
Published: (2024) -
Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability
by: Sharma, Shivam, et al.
Published: (2023) -
Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting
by: Kang, Bong Gyun, et al.
Published: (2024) -
Decomposing Attention To Find Context-Sensitive Neurons
by: Gibson, Alex
Published: (2025)