Saved in:
| Main Authors: | Sun, Qi, Pickett, Marc, Nain, Aakash Kumar, Jones, Llion |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.09298 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Ungrounded Alignment Problem
by: Pickett, Marc, et al.
Published: (2024)
by: Pickett, Marc, et al.
Published: (2024)
Fast-weight Product Key Memory
by: Zhao, Tianyu, et al.
Published: (2026)
by: Zhao, Tianyu, et al.
Published: (2026)
TransEvalnia: Reasoning-based Evaluation and Ranking of Translations
by: Sproat, Richard, et al.
Published: (2025)
by: Sproat, Richard, et al.
Published: (2025)
Sparser, Faster, Lighter Transformer Language Models
by: Cetin, Edoardo, et al.
Published: (2026)
by: Cetin, Edoardo, et al.
Published: (2026)
Building Tailored Speech Recognizers for Japanese Speaking Assessment
by: Kubo, Yotaro, et al.
Published: (2025)
by: Kubo, Yotaro, et al.
Published: (2025)
Sudoku-Bench: Evaluating creative reasoning with Sudoku variants
by: Seely, Jeffrey, et al.
Published: (2025)
by: Seely, Jeffrey, et al.
Published: (2025)
Better RAG using Relevant Information Gain
by: Pickett, Marc, et al.
Published: (2024)
by: Pickett, Marc, et al.
Published: (2024)
Improving code-mixed hate detection by native sample mixing: A case study for Hindi-English code-mixed scenario
by: Mazumder, Debajyoti, et al.
Published: (2024)
by: Mazumder, Debajyoti, et al.
Published: (2024)
Hierarchical temporal receptive windows and zero-shot timescale generalization in biologically constrained scale-invariant deep networks
by: Sarkar, Aakash, et al.
Published: (2026)
by: Sarkar, Aakash, et al.
Published: (2026)
Revealing the impact of synthetic native samples and multi-tasking strategies in Hindi-English code-mixed humour and sarcasm detection
by: Mazumder, Debajyoti, et al.
Published: (2024)
by: Mazumder, Debajyoti, et al.
Published: (2024)
A Hybrid Supervised-LLM Pipeline for Actionable Suggestion Mining in Unstructured Customer Reviews
by: Trivedi, Aakash, et al.
Published: (2026)
by: Trivedi, Aakash, et al.
Published: (2026)
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
by: Mahalingam, Aakash, et al.
Published: (2024)
by: Mahalingam, Aakash, et al.
Published: (2024)
Policy Optimization Prefers The Path of Least Resistance
by: Sanyal, Debdeep, et al.
Published: (2025)
by: Sanyal, Debdeep, et al.
Published: (2025)
Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews
by: Sorathiya, Aakash, et al.
Published: (2024)
by: Sorathiya, Aakash, et al.
Published: (2024)
ProdRev: A DNN framework for empowering customers using generative pre-trained transformers
by: Gupta, Aakash, et al.
Published: (2025)
by: Gupta, Aakash, et al.
Published: (2025)
Layered Insights: Generalizable Analysis of Authorial Style by Leveraging All Transformer Layers
by: Alshomary, Milad, et al.
Published: (2025)
by: Alshomary, Milad, et al.
Published: (2025)
Learning to Skip the Middle Layers of Transformers
by: Lawson, Tim, et al.
Published: (2025)
by: Lawson, Tim, et al.
Published: (2025)
A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness
by: Zhang, Yuhao, et al.
Published: (2024)
by: Zhang, Yuhao, et al.
Published: (2024)
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
by: Chen, Qian, et al.
Published: (2024)
by: Chen, Qian, et al.
Published: (2024)
Suppressing Final Layer Hidden State Jumps in Transformer Pretraining
by: Shibata, Keigo, et al.
Published: (2026)
by: Shibata, Keigo, et al.
Published: (2026)
AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts
by: Kumar, Vishal, et al.
Published: (2024)
by: Kumar, Vishal, et al.
Published: (2024)
Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025)
by: Sun, Qi, et al.
Published: (2025)
Intra-Layer Recurrence in Transformers for Language Modeling
by: Nguyen, Anthony, et al.
Published: (2025)
by: Nguyen, Anthony, et al.
Published: (2025)
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
by: Ding, Ning, et al.
Published: (2024)
by: Ding, Ning, et al.
Published: (2024)
An Evolved Universal Transformer Memory
by: Cetin, Edoardo, et al.
Published: (2024)
by: Cetin, Edoardo, et al.
Published: (2024)
LayerNorm Induces Recency Bias in Transformer Decoders
by: Kim, Junu, et al.
Published: (2025)
by: Kim, Junu, et al.
Published: (2025)
Provable Knowledge Acquisition and Extraction in One-Layer Transformers
by: Xu, Ruichen, et al.
Published: (2025)
by: Xu, Ruichen, et al.
Published: (2025)
Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation
by: Chaudhury, Rohan, et al.
Published: (2024)
by: Chaudhury, Rohan, et al.
Published: (2024)
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
by: Qiu, Zihan, et al.
Published: (2024)
by: Qiu, Zihan, et al.
Published: (2024)
Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
by: Musat, Tiberiu
Published: (2024)
by: Musat, Tiberiu
Published: (2024)
Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
by: Jelenić, Fran, et al.
Published: (2023)
by: Jelenić, Fran, et al.
Published: (2023)
Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models
by: Vo, James
Published: (2024)
by: Vo, James
Published: (2024)
The Realignment Problem: When Right becomes Wrong in LLMs
by: Sharma, Aakash Sen, et al.
Published: (2025)
by: Sharma, Aakash Sen, et al.
Published: (2025)
LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation
by: Wei, Jingxuan, et al.
Published: (2025)
by: Wei, Jingxuan, et al.
Published: (2025)
Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers
by: Aggarwal, Shubham, et al.
Published: (2026)
by: Aggarwal, Shubham, et al.
Published: (2026)
ReDepress: A Cognitive Framework for Detecting Depression Relapse from Social Media
by: Agarwal, Aakash Kumar, et al.
Published: (2025)
by: Agarwal, Aakash Kumar, et al.
Published: (2025)
Impact of Layer Norm on Memorization and Generalization in Transformers
by: Singhal, Rishi, et al.
Published: (2025)
by: Singhal, Rishi, et al.
Published: (2025)
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
by: Brandon, William, et al.
Published: (2024)
by: Brandon, William, et al.
Published: (2024)
LLMCache: Layer-Wise Caching Strategies for Accelerated Reuse in Transformer Inference
by: Bansal, Harsh Vardhan
Published: (2025)
by: Bansal, Harsh Vardhan
Published: (2025)
Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
Similar Items
-
The Ungrounded Alignment Problem
by: Pickett, Marc, et al.
Published: (2024) -
Fast-weight Product Key Memory
by: Zhao, Tianyu, et al.
Published: (2026) -
TransEvalnia: Reasoning-based Evaluation and Ranking of Translations
by: Sproat, Richard, et al.
Published: (2025) -
Sparser, Faster, Lighter Transformer Language Models
by: Cetin, Edoardo, et al.
Published: (2026) -
Building Tailored Speech Recognizers for Japanese Speaking Assessment
by: Kubo, Yotaro, et al.
Published: (2025)