:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chinnakonduru, Sai Sena, Mohapatra, Astarag
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.10855
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cost-Optimal Grouped-Query Attention for Long-Context Modeling
by: Chen, Yingfa, et al.
Published: (2025)

QCQA: Quality and Capacity-aware grouped Query Attention
by: Joshi, Vinay, et al.
Published: (2024)

Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
by: Devoto, Alessio, et al.
Published: (2025)

An Investigation on Group Query Hallucination Attacks
by: Miao, Kehao, et al.
Published: (2025)

GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
by: Javadi, Farnoosh, et al.
Published: (2023)

GTA: Grouped-head latenT Attention
by: Sun, Luoyang, et al.
Published: (2025)

Transformers for Complex Query Answering over Knowledge Hypergraphs
by: Tsang, Hong Ting, et al.
Published: (2025)

Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning
by: Zhussip, Magauiya, et al.
Published: (2025)

HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate
by: Zhu, Shenzhe
Published: (2025)

Memorization in Attention-only Transformers
by: Dana, Léo, et al.
Published: (2024)

Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention
by: Bae, Jeongin, et al.
Published: (2026)

Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias
by: Chhabra, Anshuman, et al.
Published: (2024)

More Expressive Attention with Negative Weights
by: Lv, Ang, et al.
Published: (2024)

SQL-Exchange: Transforming SQL Queries Across Domains
by: Daviran, Mohammadreza, et al.
Published: (2025)

PGB: One-Shot Pruning for BERT via Weight Grouping and Permutation
by: Lim, Hyemin, et al.
Published: (2025)

Hierarchical vs. Flat Iteration in Shared-Weight Transformers
by: Han, Sang-Il
Published: (2026)

Enhancing Essay Scoring with Adversarial Weights Perturbation and Metric-specific AttentionPooling
by: Huang, Jiaxin, et al.
Published: (2024)

QueryNER: Segmentation of E-commerce Queries
by: Palen-Michel, Chester, et al.
Published: (2024)

Simulating Weighted Automata over Sequences and Trees with Transformers
by: Rizvi, Michael, et al.
Published: (2024)

The Attentional White Bear Effect in Transformer Language Models
by: Ramnauth, Rebecca, et al.
Published: (2026)

Crisp Attention: Regularizing Transformers via Structured Sparsity
by: Gandhi, Sagar, et al.
Published: (2025)

Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection
by: Miralles-González, Pablo, et al.
Published: (2025)

No Query, No Access
by: Wang, Wenqiang, et al.
Published: (2025)

SignAttention: On the Interpretability of Transformer Models for Sign Language Translation
by: Bianco, Pedro Alejandro Dal, et al.
Published: (2024)

Selective Attention Improves Transformer
by: Leviathan, Yaniv, et al.
Published: (2024)

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models
by: Tian, Yuxing, et al.
Published: (2026)

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
by: Tang, Zecheng, et al.
Published: (2026)

$π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
by: Liu, Dong, et al.
Published: (2025)

SSL-SSAW: Self-Supervised Learning with Sigmoid Self-Attention Weighting for Question-Based Sign Language Translation
by: Liu, Zekang, et al.
Published: (2025)

Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets
by: Chali, Yllias, et al.
Published: (2026)

Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
by: Tamayo, Daniel, et al.
Published: (2025)

ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models
by: Kumar, Shivanshu, et al.
Published: (2025)

QueryPlot: Generating Geological Evidence Layers using Natural Language Queries for Mineral Exploration
by: Ye, Meng, et al.
Published: (2026)

Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing
by: Qiang, Zewen, et al.
Published: (2025)

What Matters in Transformers? Not All Attention is Needed
by: He, Shwai, et al.
Published: (2024)

Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue
by: Mohapatra, Biswesh, et al.
Published: (2026)

GLU Attention Improve Transformer
by: Wang, Zehao
Published: (2025)

Frame of Reference: Addressing the Challenges of Common Ground Representation in Situational Dialogs
by: Mohapatra, Biswesh, et al.
Published: (2026)

Query-Efficient Planning with Language Models
by: Gonzalez-Pumariega, Gonzalo, et al.
Published: (2024)

Forgetting Transformer: Softmax Attention with a Forget Gate
by: Lin, Zhixuan, et al.
Published: (2025)