:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sun, Qi, Pickett, Marc, Nain, Aakash Kumar, Jones, Llion
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2407.09298
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Ungrounded Alignment Problem
by: Pickett, Marc, et al.
Published: (2024)

Fast-weight Product Key Memory
by: Zhao, Tianyu, et al.
Published: (2026)

TransEvalnia: Reasoning-based Evaluation and Ranking of Translations
by: Sproat, Richard, et al.
Published: (2025)

Sparser, Faster, Lighter Transformer Language Models
by: Cetin, Edoardo, et al.
Published: (2026)

Building Tailored Speech Recognizers for Japanese Speaking Assessment
by: Kubo, Yotaro, et al.
Published: (2025)

Sudoku-Bench: Evaluating creative reasoning with Sudoku variants
by: Seely, Jeffrey, et al.
Published: (2025)

Better RAG using Relevant Information Gain
by: Pickett, Marc, et al.
Published: (2024)

Improving code-mixed hate detection by native sample mixing: A case study for Hindi-English code-mixed scenario
by: Mazumder, Debajyoti, et al.
Published: (2024)

Hierarchical temporal receptive windows and zero-shot timescale generalization in biologically constrained scale-invariant deep networks
by: Sarkar, Aakash, et al.
Published: (2026)

Revealing the impact of synthetic native samples and multi-tasking strategies in Hindi-English code-mixed humour and sarcasm detection
by: Mazumder, Debajyoti, et al.
Published: (2024)

A Hybrid Supervised-LLM Pipeline for Actionable Suggestion Mining in Unstructured Customer Reviews
by: Trivedi, Aakash, et al.
Published: (2026)

SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
by: Mahalingam, Aakash, et al.
Published: (2024)

Policy Optimization Prefers The Path of Least Resistance
by: Sanyal, Debdeep, et al.
Published: (2025)

Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews
by: Sorathiya, Aakash, et al.
Published: (2024)

ProdRev: A DNN framework for empowering customers using generative pre-trained transformers
by: Gupta, Aakash, et al.
Published: (2025)

Layered Insights: Generalizable Analysis of Authorial Style by Leveraging All Transformer Layers
by: Alshomary, Milad, et al.
Published: (2025)

Learning to Skip the Middle Layers of Transformers
by: Lawson, Tim, et al.
Published: (2025)

A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness
by: Zhang, Yuhao, et al.
Published: (2024)

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
by: Chen, Qian, et al.
Published: (2024)

Suppressing Final Layer Hidden State Jumps in Transformer Pretraining
by: Shibata, Keigo, et al.
Published: (2026)

AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts
by: Kumar, Vishal, et al.
Published: (2024)

Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025)

Intra-Layer Recurrence in Transformers for Language Modeling
by: Nguyen, Anthony, et al.
Published: (2025)

MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
by: Ding, Ning, et al.
Published: (2024)

An Evolved Universal Transformer Memory
by: Cetin, Edoardo, et al.
Published: (2024)

LayerNorm Induces Recency Bias in Transformer Decoders
by: Kim, Junu, et al.
Published: (2025)

Provable Knowledge Acquisition and Extraction in One-Layer Transformers
by: Xu, Ruichen, et al.
Published: (2025)

Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation
by: Chaudhury, Rohan, et al.
Published: (2024)

Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
by: Qiu, Zihan, et al.
Published: (2024)

Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
by: Musat, Tiberiu
Published: (2024)

Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
by: Jelenić, Fran, et al.
Published: (2023)

Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models
by: Vo, James
Published: (2024)

The Realignment Problem: When Right becomes Wrong in LLMs
by: Sharma, Aakash Sen, et al.
Published: (2025)

LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation
by: Wei, Jingxuan, et al.
Published: (2025)

Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers
by: Aggarwal, Shubham, et al.
Published: (2026)

ReDepress: A Cognitive Framework for Detecting Depression Relapse from Social Media
by: Agarwal, Aakash Kumar, et al.
Published: (2025)

Impact of Layer Norm on Memorization and Generalization in Transformers
by: Singhal, Rishi, et al.
Published: (2025)

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
by: Brandon, William, et al.
Published: (2024)

LLMCache: Layer-Wise Caching Strategies for Accelerated Reuse in Transformer Inference
by: Bansal, Harsh Vardhan
Published: (2025)

Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training
by: Brinner, Marc, et al.
Published: (2025)