Saved in:
| Main Authors: | van Engelenhoven, Adjorn, Strisciuglio, Nicola, Talavera, Estefanía |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.04239 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Regressing Transformers for Data-efficient Visual Place Recognition
by: Leyva-Vallina, María, et al.
Published: (2024)
by: Leyva-Vallina, María, et al.
Published: (2024)
CAST: Cluster-Aware Self-Training for Tabular Data via Reliable Confidence
by: Kim, Minwook, et al.
Published: (2023)
by: Kim, Minwook, et al.
Published: (2023)
Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification
by: Vaish, Puru, et al.
Published: (2024)
by: Vaish, Puru, et al.
Published: (2024)
Indoor scene recognition from images under visual corruptions
by: Costa, Willams de Lima, et al.
Published: (2024)
by: Costa, Willams de Lima, et al.
Published: (2024)
Meta-Attention: Bayesian Per-Token Routing for Efficient Transformer Inference
by: Ferrari, Alan
Published: (2026)
by: Ferrari, Alan
Published: (2026)
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
by: Berasi, Davide, et al.
Published: (2025)
by: Berasi, Davide, et al.
Published: (2025)
VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
by: Zhou, Jingbo, et al.
Published: (2026)
by: Zhou, Jingbo, et al.
Published: (2026)
CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction
by: Lee, Jaewan, et al.
Published: (2025)
by: Lee, Jaewan, et al.
Published: (2025)
Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation
by: Kinfu, Kaleab A., et al.
Published: (2025)
by: Kinfu, Kaleab A., et al.
Published: (2025)
CAST: Compositional Analysis via Spectral Tracking for Understanding Transformer Layer Functions
by: Fu, Zihao, et al.
Published: (2025)
by: Fu, Zihao, et al.
Published: (2025)
Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
by: Pham, Duy-Tung, et al.
Published: (2025)
by: Pham, Duy-Tung, et al.
Published: (2025)
Multi-level Optimal Control with Neural Surrogate Models
by: Kalise, Dante, et al.
Published: (2024)
by: Kalise, Dante, et al.
Published: (2024)
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
by: Jo, Dongwon, et al.
Published: (2026)
by: Jo, Dongwon, et al.
Published: (2026)
STS: Efficient Sparse Attention with Speculative Token Sparsity
by: Xu, Ceyu, et al.
Published: (2026)
by: Xu, Ceyu, et al.
Published: (2026)
Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
by: Huang, Siyuan, et al.
Published: (2024)
by: Huang, Siyuan, et al.
Published: (2024)
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
by: Wu, Ziyang, et al.
Published: (2024)
by: Wu, Ziyang, et al.
Published: (2024)
Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains
by: Wen, Shizheng, et al.
Published: (2025)
by: Wen, Shizheng, et al.
Published: (2025)
Mechanics of Next Token Prediction with Self-Attention
by: Li, Yingcong, et al.
Published: (2024)
by: Li, Yingcong, et al.
Published: (2024)
CAST: Causal Anchored Simplex Transport for Distribution-Valued Time Series
by: Lu, Jiecheng, et al.
Published: (2026)
by: Lu, Jiecheng, et al.
Published: (2026)
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
by: Martinico, Silvio, et al.
Published: (2026)
by: Martinico, Silvio, et al.
Published: (2026)
Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
by: Bu, Rui, et al.
Published: (2025)
by: Bu, Rui, et al.
Published: (2025)
Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns
by: Mihaila, George
Published: (2026)
by: Mihaila, George
Published: (2026)
KVCompose: Efficient Structured KV Cache Compression with Composite Tokens
by: Akulov, Dmitry, et al.
Published: (2025)
by: Akulov, Dmitry, et al.
Published: (2025)
Token Sample Complexity of Attention
by: Bohbot, Léa, et al.
Published: (2025)
by: Bohbot, Léa, et al.
Published: (2025)
Postcolonial Memory in the Netherlands
by: van Engelenhoven, Gerlov
Published: (2022)
by: van Engelenhoven, Gerlov
Published: (2022)
The CAST package for training and assessment of spatial prediction models in R
by: Meyer, Hanna, et al.
Published: (2024)
by: Meyer, Hanna, et al.
Published: (2024)
Understanding Differential Transformer Unchains Pretrained Self-Attentions
by: Kong, Chaerin, et al.
Published: (2025)
by: Kong, Chaerin, et al.
Published: (2025)
Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning
by: Shokry, Ahmed, et al.
Published: (2025)
by: Shokry, Ahmed, et al.
Published: (2025)
CHAI: Clustered Head Attention for Efficient LLM Inference
by: Agarwal, Saurabh, et al.
Published: (2024)
by: Agarwal, Saurabh, et al.
Published: (2024)
Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index
by: Athey, Susan, et al.
Published: (2016)
by: Athey, Susan, et al.
Published: (2016)
First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training
by: Kim, Gyudong, et al.
Published: (2025)
by: Kim, Gyudong, et al.
Published: (2025)
Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)
by: Choi, Jeongwhan, et al.
Published: (2023)
Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)
by: Wang, Yancheng, et al.
Published: (2024)
Multistability of Self-Attention Dynamics in Transformers
by: Altafini, Claudio
Published: (2025)
by: Altafini, Claudio
Published: (2025)
Attention-Informed Surrogates for Navigating Power-Performance Trade-offs in HPC
by: Ahmed, Ashna Nawar, et al.
Published: (2026)
by: Ahmed, Ashna Nawar, et al.
Published: (2026)
CAST: Modeling Semantic-Level Transitions for Complementary-Aware Sequential Recommendation
by: Zhang, Qian, et al.
Published: (2026)
by: Zhang, Qian, et al.
Published: (2026)
Cascade Token Selection for Transformer Attention Acceleration
by: Thomas, Stephen J.
Published: (2026)
by: Thomas, Stephen J.
Published: (2026)
Dynamics of Spontaneous Topic Changes in Next Token Prediction with Self-Attention
by: Jia, Mumin, et al.
Published: (2025)
by: Jia, Mumin, et al.
Published: (2025)
Patch-Level Tokenization with CNN Encoders and Attention for Improved Transformer Time-Series Forecasting
by: Nagrath, Saurish, et al.
Published: (2026)
by: Nagrath, Saurish, et al.
Published: (2026)
Memory-Efficient Fine-Tuning of Transformers via Token Selection
by: Simoulin, Antoine, et al.
Published: (2025)
by: Simoulin, Antoine, et al.
Published: (2025)
Similar Items
-
Regressing Transformers for Data-efficient Visual Place Recognition
by: Leyva-Vallina, María, et al.
Published: (2024) -
CAST: Cluster-Aware Self-Training for Tabular Data via Reliable Confidence
by: Kim, Minwook, et al.
Published: (2023) -
Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification
by: Vaish, Puru, et al.
Published: (2024) -
Indoor scene recognition from images under visual corruptions
by: Costa, Willams de Lima, et al.
Published: (2024) -
Meta-Attention: Bayesian Per-Token Routing for Efficient Transformer Inference
by: Ferrari, Alan
Published: (2026)