:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	van Engelenhoven, Adjorn, Strisciuglio, Nicola, Talavera, Estefanía
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2402.04239
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Regressing Transformers for Data-efficient Visual Place Recognition
by: Leyva-Vallina, María, et al.
Published: (2024)

CAST: Cluster-Aware Self-Training for Tabular Data via Reliable Confidence
by: Kim, Minwook, et al.
Published: (2023)

Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification
by: Vaish, Puru, et al.
Published: (2024)

Indoor scene recognition from images under visual corruptions
by: Costa, Willams de Lima, et al.
Published: (2024)

Meta-Attention: Bayesian Per-Token Routing for Efficient Transformer Inference
by: Ferrari, Alan
Published: (2026)

Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
by: Berasi, Davide, et al.
Published: (2025)

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
by: Zhou, Jingbo, et al.
Published: (2026)

CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction
by: Lee, Jaewan, et al.
Published: (2025)

Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation
by: Kinfu, Kaleab A., et al.
Published: (2025)

CAST: Compositional Analysis via Spectral Tracking for Understanding Transformer Layer Functions
by: Fu, Zihao, et al.
Published: (2025)

Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
by: Pham, Duy-Tung, et al.
Published: (2025)

Multi-level Optimal Control with Neural Surrogate Models
by: Kalise, Dante, et al.
Published: (2024)

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
by: Jo, Dongwon, et al.
Published: (2026)

STS: Efficient Sparse Attention with Speculative Token Sparsity
by: Xu, Ceyu, et al.
Published: (2026)

Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
by: Huang, Siyuan, et al.
Published: (2024)

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
by: Wu, Ziyang, et al.
Published: (2024)

Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains
by: Wen, Shizheng, et al.
Published: (2025)

Mechanics of Next Token Prediction with Self-Attention
by: Li, Yingcong, et al.
Published: (2024)

CAST: Causal Anchored Simplex Transport for Distribution-Valued Time Series
by: Lu, Jiecheng, et al.
Published: (2026)

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
by: Martinico, Silvio, et al.
Published: (2026)

Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
by: Bu, Rui, et al.
Published: (2025)

Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns
by: Mihaila, George
Published: (2026)

KVCompose: Efficient Structured KV Cache Compression with Composite Tokens
by: Akulov, Dmitry, et al.
Published: (2025)

Token Sample Complexity of Attention
by: Bohbot, Léa, et al.
Published: (2025)

Postcolonial Memory in the Netherlands
by: van Engelenhoven, Gerlov
Published: (2022)

The CAST package for training and assessment of spatial prediction models in R
by: Meyer, Hanna, et al.
Published: (2024)

Understanding Differential Transformer Unchains Pretrained Self-Attentions
by: Kong, Chaerin, et al.
Published: (2025)

Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning
by: Shokry, Ahmed, et al.
Published: (2025)

CHAI: Clustered Head Attention for Efficient LLM Inference
by: Agarwal, Saurabh, et al.
Published: (2024)

Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index
by: Athey, Susan, et al.
Published: (2016)

First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training
by: Kim, Gyudong, et al.
Published: (2025)

Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)

Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)

Multistability of Self-Attention Dynamics in Transformers
by: Altafini, Claudio
Published: (2025)

Attention-Informed Surrogates for Navigating Power-Performance Trade-offs in HPC
by: Ahmed, Ashna Nawar, et al.
Published: (2026)

CAST: Modeling Semantic-Level Transitions for Complementary-Aware Sequential Recommendation
by: Zhang, Qian, et al.
Published: (2026)

Cascade Token Selection for Transformer Attention Acceleration
by: Thomas, Stephen J.
Published: (2026)

Dynamics of Spontaneous Topic Changes in Next Token Prediction with Self-Attention
by: Jia, Mumin, et al.
Published: (2025)

Patch-Level Tokenization with CNN Encoders and Attention for Improved Transformer Time-Series Forecasting
by: Nagrath, Saurish, et al.
Published: (2026)

Memory-Efficient Fine-Tuning of Transformers via Token Selection
by: Simoulin, Antoine, et al.
Published: (2025)