Saved in:
| Main Author: | Zixian, Wang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.00797 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
by: Zhang, Lin, et al.
Published: (2025)
by: Zhang, Lin, et al.
Published: (2025)
The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle
by: Sigdel, Dibakar
Published: (2026)
by: Sigdel, Dibakar
Published: (2026)
ADPO: Anchored Direct Preference Optimization
by: Zixian, Wang
Published: (2025)
by: Zixian, Wang
Published: (2025)
APO: Alpha-Divergence Preference Optimization
by: Zixian, Wang
Published: (2025)
by: Zixian, Wang
Published: (2025)
Orthogonalized Policy Optimization:Policy Optimization as Orthogonal Projection in Hilbert Space
by: Zixian, Wang
Published: (2026)
by: Zixian, Wang
Published: (2026)
Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
by: Zixian, Wang
Published: (2026)
by: Zixian, Wang
Published: (2026)
Un-mixing Test-time Adaptation under Heterogeneous Data Streams
by: Su, Zixian, et al.
Published: (2024)
by: Su, Zixian, et al.
Published: (2024)
Gradient Boosting within a Single Attention Layer
by: Sargolzaei, Saleh
Published: (2026)
by: Sargolzaei, Saleh
Published: (2026)
Causally-Aware Information Bottleneck for Domain Adaptation
by: Javidian, Mohammad Ali
Published: (2026)
by: Javidian, Mohammad Ali
Published: (2026)
Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers
by: Chen, Yihong, et al.
Published: (2026)
by: Chen, Yihong, et al.
Published: (2026)
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
by: Sheen, Heejune, et al.
Published: (2024)
by: Sheen, Heejune, et al.
Published: (2024)
Revealing Combinatorial Reasoning of GNNs via Graph Concept Bottleneck Layer
by: Niu, Yue, et al.
Published: (2026)
by: Niu, Yue, et al.
Published: (2026)
Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
by: Bu, Rui, et al.
Published: (2025)
by: Bu, Rui, et al.
Published: (2025)
An Analysis of Concept Bottleneck Models: Measuring, Understanding, and Mitigating the Impact of Noisy Annotations
by: Park, Seonghwan, et al.
Published: (2025)
by: Park, Seonghwan, et al.
Published: (2025)
Multi-Layer Attention-Based Explainability via Transformers for Tabular Data
by: Gavito, Andrea Treviño, et al.
Published: (2023)
by: Gavito, Andrea Treviño, et al.
Published: (2023)
ColA: Collaborative Adaptation with Gradient Learning
by: Diao, Enmao, et al.
Published: (2024)
by: Diao, Enmao, et al.
Published: (2024)
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
by: Liang, Yingyu, et al.
Published: (2024)
by: Liang, Yingyu, et al.
Published: (2024)
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
by: Gao, Yizhao, et al.
Published: (2025)
by: Gao, Yizhao, et al.
Published: (2025)
Topological Neural Networks: Mitigating the Bottlenecks of Graph Neural Networks via Higher-Order Interactions
by: Giusti, Lorenzo
Published: (2024)
by: Giusti, Lorenzo
Published: (2024)
Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis
by: Fartale, Harshwardhan, et al.
Published: (2025)
by: Fartale, Harshwardhan, et al.
Published: (2025)
Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
by: Oomerjee, Adnan, et al.
Published: (2025)
by: Oomerjee, Adnan, et al.
Published: (2025)
DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation
by: Yan, Yang, et al.
Published: (2026)
by: Yan, Yang, et al.
Published: (2026)
Training Data Selection with Gradient Orthogonality for Efficient Domain Adaptation
by: Zhang, Xiyang, et al.
Published: (2026)
by: Zhang, Xiyang, et al.
Published: (2026)
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time
by: Liang, Yingyu, et al.
Published: (2024)
by: Liang, Yingyu, et al.
Published: (2024)
Mixture of Layers with Hybrid Attention
by: Ternovtsii, Ivan, et al.
Published: (2026)
by: Ternovtsii, Ivan, et al.
Published: (2026)
SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules
by: Chen, Xiangyu, et al.
Published: (2024)
by: Chen, Xiangyu, et al.
Published: (2024)
READ: Recurrent Adaptation of Large Transformers
by: Nguyen, John, et al.
Published: (2023)
by: Nguyen, John, et al.
Published: (2023)
Hidden Heroes and Gradient Bloats: Layer-Wise Redundancy Inverts Attribution in Transformers
by: Ye, Donald
Published: (2026)
by: Ye, Donald
Published: (2026)
Omniwise: Predicting GPU Kernels Performance with LLMs
by: Wang, Zixian, et al.
Published: (2025)
by: Wang, Zixian, et al.
Published: (2025)
SURGE: Surrogate Gradient Adaptation in Binary Neural Networks
by: Huang, Haoyu, et al.
Published: (2026)
by: Huang, Haoyu, et al.
Published: (2026)
Mitigating Gradient Overlap in Deep Residual Networks with Gradient Normalization for Improved Non-Convex Optimization
by: Yun, Juyoung
Published: (2024)
by: Yun, Juyoung
Published: (2024)
Incremental Residual Concept Bottleneck Models
by: Shang, Chenming, et al.
Published: (2024)
by: Shang, Chenming, et al.
Published: (2024)
Quantum Error Mitigation with Attention Graph Transformers for Burgers Equation Solvers on NISQ Hardware
by: Tousi, Seyed Mohamad Ali, et al.
Published: (2025)
by: Tousi, Seyed Mohamad Ali, et al.
Published: (2025)
MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers
by: Zhao, Lili, et al.
Published: (2025)
by: Zhao, Lili, et al.
Published: (2025)
Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration
by: Zhao, Yang, et al.
Published: (2026)
by: Zhao, Yang, et al.
Published: (2026)
Object Centric Concept Bottlenecks
by: Steinmann, David, et al.
Published: (2025)
by: Steinmann, David, et al.
Published: (2025)
Dynamic Graph Information Bottleneck
by: Yuan, Haonan, et al.
Published: (2024)
by: Yuan, Haonan, et al.
Published: (2024)
Counterfactual Concept Bottleneck Models
by: Dominici, Gabriele, et al.
Published: (2024)
by: Dominici, Gabriele, et al.
Published: (2024)
Mixture of Concept Bottleneck Experts
by: De Santis, Francesco, et al.
Published: (2026)
by: De Santis, Francesco, et al.
Published: (2026)
Learning to Intervene on Concept Bottlenecks
by: Steinmann, David, et al.
Published: (2023)
by: Steinmann, David, et al.
Published: (2023)
Similar Items
-
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
by: Zhang, Lin, et al.
Published: (2025) -
The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle
by: Sigdel, Dibakar
Published: (2026) -
ADPO: Anchored Direct Preference Optimization
by: Zixian, Wang
Published: (2025) -
APO: Alpha-Divergence Preference Optimization
by: Zixian, Wang
Published: (2025) -
Orthogonalized Policy Optimization:Policy Optimization as Orthogonal Projection in Hilbert Space
by: Zixian, Wang
Published: (2026)