:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Saki, Mahdi, Lipman, Justin
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2511.21034
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

A Data-Driven Review of Remote Sensing-Based Data Fusion in Precision Agriculture from Foundational to Transformer-Based Techniques
von: Saki, Mahdi, et al.
Veröffentlicht: (2024)

Memorization Capacity of Multi-Head Attention in Transformers
von: Mahdavi, Sadegh, et al.
Veröffentlicht: (2023)

A Novel Hybrid Approach Using an Attention-Based Transformer + GRU Model for Predicting Cryptocurrency Prices
von: Mahdi, Esam, et al.
Veröffentlicht: (2025)

Improving Transformers with Dynamically Composable Multi-Head Attention
von: Xiao, Da, et al.
Veröffentlicht: (2024)

Epileptic Seizure Prediction Using Patient-Adaptive Transformer Networks
von: Mahdi, Mohamed, et al.
Veröffentlicht: (2026)

Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
von: Musat, Tiberiu
Veröffentlicht: (2024)

Adaptive Head Budgeting for Efficient Multi-Head Attention
von: Faye, Bilal, et al.
Veröffentlicht: (2026)

Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers
von: Chen, Anrui, et al.
Veröffentlicht: (2026)

The Effect of Attention Head Count on Transformer Approximation
von: Yu, Penghao, et al.
Veröffentlicht: (2025)

Multi-Head Low-Rank Attention
von: Liu, Songtao, et al.
Veröffentlicht: (2026)

Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
von: Dev, Arundhathi, et al.
Veröffentlicht: (2026)

Herding LLaMaS: Using LLMs as an OS Module
von: Kamath, Aditya K, et al.
Veröffentlicht: (2024)

Global and Local Topology-Aware Attention with Persistent Homology and Euler Biases for Time-Series Forecasting
von: Faghihi, Usef, et al.
Veröffentlicht: (2026)

HGC-Herd: Efficient Heterogeneous Graph Condensation via Representative Node Herding
von: Ou, Fuyan, et al.
Veröffentlicht: (2025)

Linear Predictability of Attention Heads in Large Language Models
von: Shaikh, Khalid, et al.
Veröffentlicht: (2026)

Attention Head Entropy of LLMs Predicts Answer Correctness
von: Ostmeier, Sophie, et al.
Veröffentlicht: (2026)

MoH: Multi-Head Attention as Mixture-of-Head Attention
von: Jin, Peng, et al.
Veröffentlicht: (2024)

Exploring the Design Space of Transition Matching
von: Singer, Uriel, et al.
Veröffentlicht: (2025)

ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers
von: Zhu, Ding, et al.
Veröffentlicht: (2024)

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
von: Chen, Xingwu, et al.
Veröffentlicht: (2024)

Interleaved Head Attention
von: Duvvuri, Sai Surya, et al.
Veröffentlicht: (2026)

Multi-Channel Swin Transformer Framework for Bearing Remaining Useful Life Prediction
von: Mohajerzarrinkelk, Ali, et al.
Veröffentlicht: (2025)

Predicting BVD Re-emergence in Irish Cattle From Highly Imbalanced Herd-Level Data Using Machine Learning Algorithms
von: Mimnagh, Niamh, et al.
Veröffentlicht: (2025)

Boosting House Price Estimations with Multi-Head Gated Attention
von: Sellam, Zakaria Abdellah, et al.
Veröffentlicht: (2024)

The Anxiety of Influence: Bloom Filters in Transformer Attention Heads
von: Balogh, Peter
Veröffentlicht: (2026)

Flow Matching on General Geometries
von: Chen, Ricky T. Q., et al.
Veröffentlicht: (2023)

Geometric Analysis of Token Selection in Multi-Head Attention
von: Mudarisov, Timur, et al.
Veröffentlicht: (2026)

Superiority of Multi-Head Attention in In-Context Linear Regression
von: Cui, Yingqian, et al.
Veröffentlicht: (2024)

A Multi-Head Attention Soft Random Forest for Interpretable Patient No-Show Prediction
von: Amalina, Ninda Nurseha, et al.
Veröffentlicht: (2025)

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
von: Csordás, Róbert, et al.
Veröffentlicht: (2023)

STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead
von: Mahdavian, Mohammad, et al.
Veröffentlicht: (2022)

Beyond Parallelism: Synergistic Computational Graph Effects in Multi-Head Attention
von: Borde, Haitz Sáez de Ocáriz
Veröffentlicht: (2025)

A Novel Hybrid Approach for Tornado Prediction in the United States: Kalman-Convolutional BiLSTM with Multi-Head Attention
von: Zhou, Jiawei
Veröffentlicht: (2024)

Transformer Model for Alzheimer's Disease Progression Prediction Using Longitudinal Visit Sequences
von: Moghaddami, Mahdi, et al.
Veröffentlicht: (2025)

Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
von: Gabetni, Firas, et al.
Veröffentlicht: (2025)

RecurFormer: Not All Transformer Heads Need Self-Attention
von: Yan, Ruiqing, et al.
Veröffentlicht: (2024)

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
von: Chen, Yilong, et al.
Veröffentlicht: (2024)

A Capacity-Based Rationale for Multi-Head Attention
von: Adler, Micah
Veröffentlicht: (2025)

Head Pursuit: Probing Attention Specialization in Multimodal Transformers
von: Basile, Lorenzo, et al.
Veröffentlicht: (2025)

Leaf Spectral Reflectance Prediction Using Multi-Head Attention Neural Networks
von: Farajpoor, Parastoo, et al.
Veröffentlicht: (2026)