:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Zixian, Wang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.00797
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
by: Zhang, Lin, et al.
Published: (2025)

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle
by: Sigdel, Dibakar
Published: (2026)

ADPO: Anchored Direct Preference Optimization
by: Zixian, Wang
Published: (2025)

APO: Alpha-Divergence Preference Optimization
by: Zixian, Wang
Published: (2025)

Orthogonalized Policy Optimization:Policy Optimization as Orthogonal Projection in Hilbert Space
by: Zixian, Wang
Published: (2026)

Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
by: Zixian, Wang
Published: (2026)

Un-mixing Test-time Adaptation under Heterogeneous Data Streams
by: Su, Zixian, et al.
Published: (2024)

Gradient Boosting within a Single Attention Layer
by: Sargolzaei, Saleh
Published: (2026)

Causally-Aware Information Bottleneck for Domain Adaptation
by: Javidian, Mohammad Ali
Published: (2026)

Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers
by: Chen, Yihong, et al.
Published: (2026)

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
by: Sheen, Heejune, et al.
Published: (2024)

Revealing Combinatorial Reasoning of GNNs via Graph Concept Bottleneck Layer
by: Niu, Yue, et al.
Published: (2026)

Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
by: Bu, Rui, et al.
Published: (2025)

An Analysis of Concept Bottleneck Models: Measuring, Understanding, and Mitigating the Impact of Noisy Annotations
by: Park, Seonghwan, et al.
Published: (2025)

Multi-Layer Attention-Based Explainability via Transformers for Tabular Data
by: Gavito, Andrea Treviño, et al.
Published: (2023)

ColA: Collaborative Adaptation with Gradient Learning
by: Diao, Enmao, et al.
Published: (2024)

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
by: Liang, Yingyu, et al.
Published: (2024)

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
by: Gao, Yizhao, et al.
Published: (2025)

Topological Neural Networks: Mitigating the Bottlenecks of Graph Neural Networks via Higher-Order Interactions
by: Giusti, Lorenzo
Published: (2024)

Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis
by: Fartale, Harshwardhan, et al.
Published: (2025)

Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
by: Oomerjee, Adnan, et al.
Published: (2025)

DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation
by: Yan, Yang, et al.
Published: (2026)

Training Data Selection with Gradient Orthogonality for Efficient Domain Adaptation
by: Zhang, Xiyang, et al.
Published: (2026)

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time
by: Liang, Yingyu, et al.
Published: (2024)

Mixture of Layers with Hybrid Attention
by: Ternovtsii, Ivan, et al.
Published: (2026)

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules
by: Chen, Xiangyu, et al.
Published: (2024)

READ: Recurrent Adaptation of Large Transformers
by: Nguyen, John, et al.
Published: (2023)

Hidden Heroes and Gradient Bloats: Layer-Wise Redundancy Inverts Attribution in Transformers
by: Ye, Donald
Published: (2026)

Omniwise: Predicting GPU Kernels Performance with LLMs
by: Wang, Zixian, et al.
Published: (2025)

SURGE: Surrogate Gradient Adaptation in Binary Neural Networks
by: Huang, Haoyu, et al.
Published: (2026)

Mitigating Gradient Overlap in Deep Residual Networks with Gradient Normalization for Improved Non-Convex Optimization
by: Yun, Juyoung
Published: (2024)

Incremental Residual Concept Bottleneck Models
by: Shang, Chenming, et al.
Published: (2024)

Quantum Error Mitigation with Attention Graph Transformers for Burgers Equation Solvers on NISQ Hardware
by: Tousi, Seyed Mohamad Ali, et al.
Published: (2025)

MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers
by: Zhao, Lili, et al.
Published: (2025)

Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration
by: Zhao, Yang, et al.
Published: (2026)

Object Centric Concept Bottlenecks
by: Steinmann, David, et al.
Published: (2025)

Dynamic Graph Information Bottleneck
by: Yuan, Haonan, et al.
Published: (2024)

Counterfactual Concept Bottleneck Models
by: Dominici, Gabriele, et al.
Published: (2024)

Mixture of Concept Bottleneck Experts
by: De Santis, Francesco, et al.
Published: (2026)

Learning to Intervene on Concept Bottlenecks
by: Steinmann, David, et al.
Published: (2023)