Saved in:
| Main Authors: | Wang, Yi, Fang, Ruoyi, Xie, Anzhuo, Feng, Hanrui, Lai, Jianlin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.12122 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep Learning Approach for Clinical Risk Identification Using Transformer Modeling of Heterogeneous EHR Data
by: Xie, Anzhuo, et al.
Published: (2025)
by: Xie, Anzhuo, et al.
Published: (2025)
Application of Deep Generative Models for Anomaly Detection in Complex Financial Transactions
by: Tang, Tengda, et al.
Published: (2025)
by: Tang, Tengda, et al.
Published: (2025)
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
by: Wang, Hanrui, et al.
Published: (2020)
by: Wang, Hanrui, et al.
Published: (2020)
A Deep Learning Approach to Anomaly Detection in High-Frequency Trading Data
by: Bao, Qiuliuyang, et al.
Published: (2025)
by: Bao, Qiuliuyang, et al.
Published: (2025)
Enhancing Transformer Training Efficiency with Dynamic Dropout
by: Yan, Hanrui, et al.
Published: (2024)
by: Yan, Hanrui, et al.
Published: (2024)
ATM-GAD: Adaptive Temporal Motif Graph Anomaly Detection for Financial Transaction Networks
by: Zhang, Zeyue, et al.
Published: (2025)
by: Zhang, Zeyue, et al.
Published: (2025)
Improving Transformers with Dynamically Composable Multi-Head Attention
by: Xiao, Da, et al.
Published: (2024)
by: Xiao, Da, et al.
Published: (2024)
Self-Attention Mechanism in Multimodal Context for Banking Transaction Flow
by: Delestre, Cyrile, et al.
Published: (2024)
by: Delestre, Cyrile, et al.
Published: (2024)
Adaptive Head Budgeting for Efficient Multi-Head Attention
by: Faye, Bilal, et al.
Published: (2026)
by: Faye, Bilal, et al.
Published: (2026)
Multi-Head Low-Rank Attention
by: Liu, Songtao, et al.
Published: (2026)
by: Liu, Songtao, et al.
Published: (2026)
Stability and Generalization of Hypergraph Collaborative Networks
by: Ng, Michael, et al.
Published: (2023)
by: Ng, Michael, et al.
Published: (2023)
Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models
by: Erden, Caner
Published: (2025)
by: Erden, Caner
Published: (2025)
Hyperspectral Anomaly Detection with Self-Supervised Anomaly Prior
by: Liu, Yidan, et al.
Published: (2024)
by: Liu, Yidan, et al.
Published: (2024)
Interpretable Hierarchical Attention Network for Medical Condition Identification
by: Fang, Dongping, et al.
Published: (2024)
by: Fang, Dongping, et al.
Published: (2024)
Quantum Graph Attention Network: A Novel Quantum Multi-Head Attention Mechanism for Graph Learning
by: Ning, An, et al.
Published: (2025)
by: Ning, An, et al.
Published: (2025)
Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification
by: Donhauser, Konstantin, et al.
Published: (2025)
by: Donhauser, Konstantin, et al.
Published: (2025)
Memorization Capacity of Multi-Head Attention in Transformers
by: Mahdavi, Sadegh, et al.
Published: (2023)
by: Mahdavi, Sadegh, et al.
Published: (2023)
An Empirical Study of Multi-Generation Sampling for Jailbreak Detection in Large Language Models
by: Luo, Hanrui, et al.
Published: (2026)
by: Luo, Hanrui, et al.
Published: (2026)
MoH: Multi-Head Attention as Mixture-of-Head Attention
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Anomaly Detection in High-Dimensional Bank Account Balances via Robust Methods
by: Maddanu, Federico, et al.
Published: (2025)
by: Maddanu, Federico, et al.
Published: (2025)
Unsupervised Graph Modeling for Anomaly Detection in Accounting Subject Relationships
by: Wang, Yuhan, et al.
Published: (2026)
by: Wang, Yuhan, et al.
Published: (2026)
Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection
by: Zheng, Zhi, et al.
Published: (2025)
by: Zheng, Zhi, et al.
Published: (2025)
Multi-Head Spectral-Adaptive Graph Anomaly Detection
by: Cao, Qingyue, et al.
Published: (2025)
by: Cao, Qingyue, et al.
Published: (2025)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)
by: Li, Cheng, et al.
Published: (2025)
Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers
by: Chen, Anrui, et al.
Published: (2026)
by: Chen, Anrui, et al.
Published: (2026)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention
by: He, Jianliang, et al.
Published: (2025)
by: He, Jianliang, et al.
Published: (2025)
Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention
by: Pendharkar, Ayan
Published: (2026)
by: Pendharkar, Ayan
Published: (2026)
Interleaved Head Attention
by: Duvvuri, Sai Surya, et al.
Published: (2026)
by: Duvvuri, Sai Surya, et al.
Published: (2026)
Hybrid GCN-GRU Model for Anomaly Detection in Cryptocurrency Transactions
by: Na, Gyuyeon, et al.
Published: (2025)
by: Na, Gyuyeon, et al.
Published: (2025)
Boosting House Price Estimations with Multi-Head Gated Attention
by: Sellam, Zakaria Abdellah, et al.
Published: (2024)
by: Sellam, Zakaria Abdellah, et al.
Published: (2024)
ProxyAttn: Guided Sparse Attention via Representative Heads
by: Wang, Yixuan, et al.
Published: (2025)
by: Wang, Yixuan, et al.
Published: (2025)
Molecular Odor Prediction Based on Multi-Feature Graph Attention Networks
by: Xie, HongXin, et al.
Published: (2025)
by: Xie, HongXin, et al.
Published: (2025)
SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs
by: Fang, Lanting, et al.
Published: (2024)
by: Fang, Lanting, et al.
Published: (2024)
Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality
by: Chen, Siyu, et al.
Published: (2024)
by: Chen, Siyu, et al.
Published: (2024)
Geometric Analysis of Token Selection in Multi-Head Attention
by: Mudarisov, Timur, et al.
Published: (2026)
by: Mudarisov, Timur, et al.
Published: (2026)
Superiority of Multi-Head Attention in In-Context Linear Regression
by: Cui, Yingqian, et al.
Published: (2024)
by: Cui, Yingqian, et al.
Published: (2024)
Quantum Mixed-State Self-Attention Network
by: Chen, Fu, et al.
Published: (2024)
by: Chen, Fu, et al.
Published: (2024)
DeepSTA: A Spatial-Temporal Attention Network for Logistics Delivery Timely Rate Prediction in Anomaly Conditions
by: Yi, Jinhui, et al.
Published: (2025)
by: Yi, Jinhui, et al.
Published: (2025)
Beyond Parallelism: Synergistic Computational Graph Effects in Multi-Head Attention
by: Borde, Haitz Sáez de Ocáriz
Published: (2025)
by: Borde, Haitz Sáez de Ocáriz
Published: (2025)
Multi-Head Self-Attending Neural Tucker Factorization
by: Hou, Yikai, et al.
Published: (2025)
by: Hou, Yikai, et al.
Published: (2025)
Similar Items
-
Deep Learning Approach for Clinical Risk Identification Using Transformer Modeling of Heterogeneous EHR Data
by: Xie, Anzhuo, et al.
Published: (2025) -
Application of Deep Generative Models for Anomaly Detection in Complex Financial Transactions
by: Tang, Tengda, et al.
Published: (2025) -
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
by: Wang, Hanrui, et al.
Published: (2020) -
A Deep Learning Approach to Anomaly Detection in High-Frequency Trading Data
by: Bao, Qiuliuyang, et al.
Published: (2025) -
Enhancing Transformer Training Efficiency with Dynamic Dropout
by: Yan, Hanrui, et al.
Published: (2024)