Saved in:
| Main Authors: | Li, Zixuan, Geng, Binzong, Xiong, Jing, He, Yong, Hu, Yuxuan, Chen, Jian, Chen, Dingwei, Chang, Xiyu, Zhang, Liang, Mo, Linjian, Li, Chengming, Yuan, Chuan, Sun, Zhenan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.03668 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors
by: Geng, Binzong, et al.
Published: (2024)
by: Geng, Binzong, et al.
Published: (2024)
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
by: Tang, Jiakai, et al.
Published: (2026)
by: Tang, Jiakai, et al.
Published: (2026)
Attention Sinks and Outliers in Attention Residuals
by: Luo, Haozheng, et al.
Published: (2026)
by: Luo, Haozheng, et al.
Published: (2026)
Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers
by: Chen, Yihong, et al.
Published: (2026)
by: Chen, Yihong, et al.
Published: (2026)
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks
by: Sun, Shangwen, et al.
Published: (2026)
by: Sun, Shangwen, et al.
Published: (2026)
Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse
by: Fu, Zizhuo, et al.
Published: (2026)
by: Fu, Zizhuo, et al.
Published: (2026)
Efficient Streaming Language Models with Attention Sinks
by: Xiao, Guangxuan, et al.
Published: (2023)
by: Xiao, Guangxuan, et al.
Published: (2023)
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
by: Su, Zunhai, et al.
Published: (2026)
by: Su, Zunhai, et al.
Published: (2026)
DGenCTR: Towards a Universal Generative Paradigm for Click-Through Rate Prediction via Discrete Diffusion
by: Zhang, Moyu, et al.
Published: (2025)
by: Zhang, Moyu, et al.
Published: (2025)
Efficient Vocal Source Separation Through Windowed Sink Attention
by: Benetatos, Christodoulos, et al.
Published: (2025)
by: Benetatos, Christodoulos, et al.
Published: (2025)
DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation
by: Ye, Bo, et al.
Published: (2026)
by: Ye, Bo, et al.
Published: (2026)
ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction
by: Lin, Jianghao, et al.
Published: (2023)
by: Lin, Jianghao, et al.
Published: (2023)
Garbage Attention in Large Language Models: BOS Sink Heads and Sink-aware Pruning
by: Sok, Jaewon, et al.
Published: (2026)
by: Sok, Jaewon, et al.
Published: (2026)
Attention Sinks in Diffusion Language Models
by: Rulli, Maximo Eduardo, et al.
Published: (2025)
by: Rulli, Maximo Eduardo, et al.
Published: (2025)
ASAP: Attention Sink Anchored Pruning
by: Lee, Jaehyuk, et al.
Published: (2026)
by: Lee, Jaehyuk, et al.
Published: (2026)
On the Existence and Behavior of Secondary Attention Sinks
by: Wong, Jeffrey T. H., et al.
Published: (2025)
by: Wong, Jeffrey T. H., et al.
Published: (2025)
When Sinks Help or Hurt: Unified Framework for Attention Sink in Large Vision-Language Models
by: Choi, Jiho, et al.
Published: (2026)
by: Choi, Jiho, et al.
Published: (2026)
How Attention Sinks Emerge in Large Language Models: An Interpretability Perspective
by: Peng, Runyu, et al.
Published: (2026)
by: Peng, Runyu, et al.
Published: (2026)
Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink
by: Liu, Guozhi, et al.
Published: (2026)
by: Liu, Guozhi, et al.
Published: (2026)
Sink or Swim
by: Larsen, Karen
Published: (2004)
by: Larsen, Karen
Published: (2004)
Spectral Filters, Dark Signals, and Attention Sinks
by: Cancedda, Nicola
Published: (2024)
by: Cancedda, Nicola
Published: (2024)
Global Nitrogen Deposition Promotes Carbon Sink Formation in Terrestrial Ecosystems
by: Lei Li, et al.
Published: (2026)
by: Lei Li, et al.
Published: (2026)
Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning
by: Shang, Bingqi, et al.
Published: (2025)
by: Shang, Bingqi, et al.
Published: (2025)
Quadratic Interest Network for Multimodal Click-Through Rate Prediction
by: Li, Honghao, et al.
Published: (2025)
by: Li, Honghao, et al.
Published: (2025)
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
by: Luo, Jiayun, et al.
Published: (2025)
by: Luo, Jiayun, et al.
Published: (2025)
Attention Sinks in Diffusion Transformers: A Causal Analysis
by: Wu, Fangzheng, et al.
Published: (2026)
by: Wu, Fangzheng, et al.
Published: (2026)
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
by: Zuhri, Zayd M. K., et al.
Published: (2025)
by: Zuhri, Zayd M. K., et al.
Published: (2025)
SLASH the Sink: Sharpening Structural Attention Inside LLMs
by: Liu, Yiming, et al.
Published: (2026)
by: Liu, Yiming, et al.
Published: (2026)
Stochastic Parroting in Temporal Attention -- Regulating the Diagonal Sink
by: Hankemeier, Victoria, et al.
Published: (2026)
by: Hankemeier, Victoria, et al.
Published: (2026)
SinkRouter: Sink-Aware Routing for Efficient Long-Context Decoding in Large Language and Multimodal Models
by: Liu, Junnan, et al.
Published: (2026)
by: Liu, Junnan, et al.
Published: (2026)
EST: Towards Efficient Scaling Laws in Click-Through Rate Prediction via Unified Modeling
by: Liu, Mingyang, et al.
Published: (2026)
by: Liu, Mingyang, et al.
Published: (2026)
The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity
by: Li, Siquan, et al.
Published: (2026)
by: Li, Siquan, et al.
Published: (2026)
Addressing Exacerbated Attention Sink for Source-Free Cross-Domain Few-Shot Learning
by: Yi, Shuai, et al.
Published: (2026)
by: Yi, Shuai, et al.
Published: (2026)
Emotion and Intention Guided Multi-Modal Learning for Sticker Response Selection
by: Hu, Yuxuan, et al.
Published: (2025)
by: Hu, Yuxuan, et al.
Published: (2025)
When Attention Sink Emerges in Language Models: An Empirical View
by: Gu, Xiangming, et al.
Published: (2024)
by: Gu, Xiangming, et al.
Published: (2024)
On the Nature of Attention Sink that Shapes Decoding Strategy in Omni-LLMs
by: Yoo, Suho, et al.
Published: (2026)
by: Yoo, Suho, et al.
Published: (2026)
Artifacts and Attention Sinks: Structured Approximations for Efficient Vision Transformers
by: Lu, Andrew, et al.
Published: (2025)
by: Lu, Andrew, et al.
Published: (2025)
Attention Sinks: A 'Catch, Tag, Release' Mechanism for Embeddings
by: Zhang, Stephen, et al.
Published: (2025)
by: Zhang, Stephen, et al.
Published: (2025)
Optimizing Feature Set for Click-Through Rate Prediction
by: Lyu, Fuyuan, et al.
Published: (2023)
by: Lyu, Fuyuan, et al.
Published: (2023)
Similar Items
-
Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors
by: Geng, Binzong, et al.
Published: (2024) -
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
by: Tang, Jiakai, et al.
Published: (2026) -
Attention Sinks and Outliers in Attention Residuals
by: Luo, Haozheng, et al.
Published: (2026) -
Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers
by: Chen, Yihong, et al.
Published: (2026) -
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
by: Liu, Xu, et al.
Published: (2026)