Saved in:
| Main Authors: | Chinnakonduru, Sai Sena, Mohapatra, Astarag |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.10855 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
by: Chen, Yingfa, et al.
Published: (2025)
by: Chen, Yingfa, et al.
Published: (2025)
QCQA: Quality and Capacity-aware grouped Query Attention
by: Joshi, Vinay, et al.
Published: (2024)
by: Joshi, Vinay, et al.
Published: (2024)
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
by: Devoto, Alessio, et al.
Published: (2025)
by: Devoto, Alessio, et al.
Published: (2025)
An Investigation on Group Query Hallucination Attacks
by: Miao, Kehao, et al.
Published: (2025)
by: Miao, Kehao, et al.
Published: (2025)
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
by: Javadi, Farnoosh, et al.
Published: (2023)
by: Javadi, Farnoosh, et al.
Published: (2023)
GTA: Grouped-head latenT Attention
by: Sun, Luoyang, et al.
Published: (2025)
by: Sun, Luoyang, et al.
Published: (2025)
Transformers for Complex Query Answering over Knowledge Hypergraphs
by: Tsang, Hong Ting, et al.
Published: (2025)
by: Tsang, Hong Ting, et al.
Published: (2025)
Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning
by: Zhussip, Magauiya, et al.
Published: (2025)
by: Zhussip, Magauiya, et al.
Published: (2025)
HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate
by: Zhu, Shenzhe
Published: (2025)
by: Zhu, Shenzhe
Published: (2025)
Memorization in Attention-only Transformers
by: Dana, Léo, et al.
Published: (2024)
by: Dana, Léo, et al.
Published: (2024)
Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention
by: Bae, Jeongin, et al.
Published: (2026)
by: Bae, Jeongin, et al.
Published: (2026)
Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias
by: Chhabra, Anshuman, et al.
Published: (2024)
by: Chhabra, Anshuman, et al.
Published: (2024)
More Expressive Attention with Negative Weights
by: Lv, Ang, et al.
Published: (2024)
by: Lv, Ang, et al.
Published: (2024)
SQL-Exchange: Transforming SQL Queries Across Domains
by: Daviran, Mohammadreza, et al.
Published: (2025)
by: Daviran, Mohammadreza, et al.
Published: (2025)
PGB: One-Shot Pruning for BERT via Weight Grouping and Permutation
by: Lim, Hyemin, et al.
Published: (2025)
by: Lim, Hyemin, et al.
Published: (2025)
Hierarchical vs. Flat Iteration in Shared-Weight Transformers
by: Han, Sang-Il
Published: (2026)
by: Han, Sang-Il
Published: (2026)
Enhancing Essay Scoring with Adversarial Weights Perturbation and Metric-specific AttentionPooling
by: Huang, Jiaxin, et al.
Published: (2024)
by: Huang, Jiaxin, et al.
Published: (2024)
QueryNER: Segmentation of E-commerce Queries
by: Palen-Michel, Chester, et al.
Published: (2024)
by: Palen-Michel, Chester, et al.
Published: (2024)
Simulating Weighted Automata over Sequences and Trees with Transformers
by: Rizvi, Michael, et al.
Published: (2024)
by: Rizvi, Michael, et al.
Published: (2024)
The Attentional White Bear Effect in Transformer Language Models
by: Ramnauth, Rebecca, et al.
Published: (2026)
by: Ramnauth, Rebecca, et al.
Published: (2026)
Crisp Attention: Regularizing Transformers via Structured Sparsity
by: Gandhi, Sagar, et al.
Published: (2025)
by: Gandhi, Sagar, et al.
Published: (2025)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection
by: Miralles-González, Pablo, et al.
Published: (2025)
by: Miralles-González, Pablo, et al.
Published: (2025)
No Query, No Access
by: Wang, Wenqiang, et al.
Published: (2025)
by: Wang, Wenqiang, et al.
Published: (2025)
SignAttention: On the Interpretability of Transformer Models for Sign Language Translation
by: Bianco, Pedro Alejandro Dal, et al.
Published: (2024)
by: Bianco, Pedro Alejandro Dal, et al.
Published: (2024)
Selective Attention Improves Transformer
by: Leviathan, Yaniv, et al.
Published: (2024)
by: Leviathan, Yaniv, et al.
Published: (2024)
Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models
by: Tian, Yuxing, et al.
Published: (2026)
by: Tian, Yuxing, et al.
Published: (2026)
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
by: Tang, Zecheng, et al.
Published: (2026)
by: Tang, Zecheng, et al.
Published: (2026)
$π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
SSL-SSAW: Self-Supervised Learning with Sigmoid Self-Attention Weighting for Question-Based Sign Language Translation
by: Liu, Zekang, et al.
Published: (2025)
by: Liu, Zekang, et al.
Published: (2025)
Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets
by: Chali, Yllias, et al.
Published: (2026)
by: Chali, Yllias, et al.
Published: (2026)
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
by: Tamayo, Daniel, et al.
Published: (2025)
by: Tamayo, Daniel, et al.
Published: (2025)
ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models
by: Kumar, Shivanshu, et al.
Published: (2025)
by: Kumar, Shivanshu, et al.
Published: (2025)
QueryPlot: Generating Geological Evidence Layers using Natural Language Queries for Mineral Exploration
by: Ye, Meng, et al.
Published: (2026)
by: Ye, Meng, et al.
Published: (2026)
Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing
by: Qiang, Zewen, et al.
Published: (2025)
by: Qiang, Zewen, et al.
Published: (2025)
What Matters in Transformers? Not All Attention is Needed
by: He, Shwai, et al.
Published: (2024)
by: He, Shwai, et al.
Published: (2024)
Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue
by: Mohapatra, Biswesh, et al.
Published: (2026)
by: Mohapatra, Biswesh, et al.
Published: (2026)
GLU Attention Improve Transformer
by: Wang, Zehao
Published: (2025)
by: Wang, Zehao
Published: (2025)
Frame of Reference: Addressing the Challenges of Common Ground Representation in Situational Dialogs
by: Mohapatra, Biswesh, et al.
Published: (2026)
by: Mohapatra, Biswesh, et al.
Published: (2026)
Query-Efficient Planning with Language Models
by: Gonzalez-Pumariega, Gonzalo, et al.
Published: (2024)
by: Gonzalez-Pumariega, Gonzalo, et al.
Published: (2024)
Forgetting Transformer: Softmax Attention with a Forget Gate
by: Lin, Zhixuan, et al.
Published: (2025)
by: Lin, Zhixuan, et al.
Published: (2025)
Similar Items
-
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
by: Chen, Yingfa, et al.
Published: (2025) -
QCQA: Quality and Capacity-aware grouped Query Attention
by: Joshi, Vinay, et al.
Published: (2024) -
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
by: Devoto, Alessio, et al.
Published: (2025) -
An Investigation on Group Query Hallucination Attacks
by: Miao, Kehao, et al.
Published: (2025) -
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
by: Javadi, Farnoosh, et al.
Published: (2023)