Saved in:
| Main Authors: | Duvvuri, Sai Surya, Patel, Nirmal, Gupta, Nilesh, Dhillon, Inderjit S. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10410 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LASER: Attention with Exponential Transformation
by: Duvvuri, Sai Surya, et al.
Published: (2024)
by: Duvvuri, Sai Surya, et al.
Published: (2024)
ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization
by: Patel, Nirmal, et al.
Published: (2026)
by: Patel, Nirmal, et al.
Published: (2026)
Interleaved Head Attention
by: Duvvuri, Sai Surya, et al.
Published: (2026)
by: Duvvuri, Sai Surya, et al.
Published: (2026)
Towards Quantifying the Preconditioning Effect of Adam
by: Das, Rudrajit, et al.
Published: (2024)
by: Das, Rudrajit, et al.
Published: (2024)
The Art of Scaling Reinforcement Learning Compute for LLMs
by: Khatri, Devvrit, et al.
Published: (2025)
by: Khatri, Devvrit, et al.
Published: (2025)
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
by: Yen, Jui-Nan, et al.
Published: (2024)
by: Yen, Jui-Nan, et al.
Published: (2024)
Dual-Encoders for Extreme Multi-Label Classification
by: Gupta, Nilesh, et al.
Published: (2023)
by: Gupta, Nilesh, et al.
Published: (2023)
LLM-guided Hierarchical Search for End-to-end Reasoning Intensive Retrieval
by: Gupta, Nilesh, et al.
Published: (2025)
by: Gupta, Nilesh, et al.
Published: (2025)
EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval
by: Kumar, Ramnath, et al.
Published: (2023)
by: Kumar, Ramnath, et al.
Published: (2023)
Scalable In-context Ranking with Generative Models
by: Gupta, Nilesh, et al.
Published: (2025)
by: Gupta, Nilesh, et al.
Published: (2025)
Geometric Median (GM) Matching for Robust Data Pruning
by: Acharya, Anish, et al.
Published: (2024)
by: Acharya, Anish, et al.
Published: (2024)
Fast and Simplex: 2-Simplicial Attention in Triton
by: Roy, Aurko, et al.
Published: (2025)
by: Roy, Aurko, et al.
Published: (2025)
Geometric Median Matching for Robust k-Subset Selection from Noisy Data
by: Acharya, Anish, et al.
Published: (2025)
by: Acharya, Anish, et al.
Published: (2025)
Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data
by: Acharya, Anish, et al.
Published: (2024)
by: Acharya, Anish, et al.
Published: (2024)
Compressing Many-Shots in In-Context Learning
by: Khatri, Devvrit, et al.
Published: (2025)
by: Khatri, Devvrit, et al.
Published: (2025)
LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems
by: Casablanca, Ernesto, et al.
Published: (2025)
by: Casablanca, Ernesto, et al.
Published: (2025)
Preconditioned Attention: Enhancing Efficiency in Transformers
by: Saratchandran, Hemanth
Published: (2026)
by: Saratchandran, Hemanth
Published: (2026)
Positive Unlabeled Contrastive Learning
by: Acharya, Anish, et al.
Published: (2022)
by: Acharya, Anish, et al.
Published: (2022)
Retraining with Predicted Hard Labels Provably Increases Model Accuracy
by: Das, Rudrajit, et al.
Published: (2024)
by: Das, Rudrajit, et al.
Published: (2024)
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
by: Wang, Yihan, et al.
Published: (2022)
by: Wang, Yihan, et al.
Published: (2022)
Attention Meets UAVs: A Comprehensive Evaluation of DDoS Detection in Low-Cost UAVs
by: Sharma, Ashish, et al.
Published: (2024)
by: Sharma, Ashish, et al.
Published: (2024)
Large Language Models are Interpretable Learners
by: Wang, Ruochen, et al.
Published: (2024)
by: Wang, Ruochen, et al.
Published: (2024)
Multi-Head Attention Is a Multi-Player Game
by: Chakrabarti, Kushal, et al.
Published: (2026)
by: Chakrabarti, Kushal, et al.
Published: (2026)
Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs
by: Bansal, Rachit, et al.
Published: (2025)
by: Bansal, Rachit, et al.
Published: (2025)
Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning
by: Goel, Gautam, et al.
Published: (2026)
by: Goel, Gautam, et al.
Published: (2026)
Exploring Design Choices for Building Language-Specific LLMs
by: Tejaswi, Atula, et al.
Published: (2024)
by: Tejaswi, Atula, et al.
Published: (2024)
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention
by: Zhou, Chenyu, et al.
Published: (2026)
by: Zhou, Chenyu, et al.
Published: (2026)
Universal Sequence Preconditioning
by: Marsden, Annie, et al.
Published: (2025)
by: Marsden, Annie, et al.
Published: (2025)
Matryoshka Model Learning for Improved Elastic Student Models
by: Verma, Chetan, et al.
Published: (2025)
by: Verma, Chetan, et al.
Published: (2025)
Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference
by: Joshi, Thomas, et al.
Published: (2025)
by: Joshi, Thomas, et al.
Published: (2025)
Multi-Knowledge Fusion Network for Time Series Representation Learning
by: Sakhinana, Sagar Srinivas, et al.
Published: (2024)
by: Sakhinana, Sagar Srinivas, et al.
Published: (2024)
Are Anxiety Detection Models Generalizable? A Cross-Activity and Cross-Population Study Using Wearables
by: Sahu, Nilesh Kumar, et al.
Published: (2025)
by: Sahu, Nilesh Kumar, et al.
Published: (2025)
Open-TQ-Metal: Fused Compressed-Domain Attention for Long-Context LLM Inference on Apple Silicon
by: Vegasena, Sai
Published: (2026)
by: Vegasena, Sai
Published: (2026)
Multi-Source Knowledge-Based Hybrid Neural Framework for Time Series Representation Learning
by: Sakhinana, Sagar Srinivas, et al.
Published: (2024)
by: Sakhinana, Sagar Srinivas, et al.
Published: (2024)
On the Nystrom Approximation for Preconditioning in Kernel Machines
by: Abedsoltan, Amirhesam, et al.
Published: (2023)
by: Abedsoltan, Amirhesam, et al.
Published: (2023)
A Representation-Consistent Gated Recurrent Framework for Robust Medical Time-Series Classification
by: Sai, Maitri Krishna
Published: (2026)
by: Sai, Maitri Krishna
Published: (2026)
AnxietyFaceTrack: A Smartphone-Based Non-Intrusive Approach for Detecting Social Anxiety Using Facial Features
by: Sahu, Nilesh Kumar, et al.
Published: (2025)
by: Sahu, Nilesh Kumar, et al.
Published: (2025)
MatFormer: Nested Transformer for Elastic Inference
by: Devvrit, et al.
Published: (2023)
by: Devvrit, et al.
Published: (2023)
The Power of Second Order Methods for Sequence Preconditioning
by: Marsden, Annie, et al.
Published: (2026)
by: Marsden, Annie, et al.
Published: (2026)
Preconditioned Inexact Stochastic ADMM for Deep Model
by: Zhou, Shenglong, et al.
Published: (2025)
by: Zhou, Shenglong, et al.
Published: (2025)
Similar Items
-
LASER: Attention with Exponential Transformation
by: Duvvuri, Sai Surya, et al.
Published: (2024) -
ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization
by: Patel, Nirmal, et al.
Published: (2026) -
Interleaved Head Attention
by: Duvvuri, Sai Surya, et al.
Published: (2026) -
Towards Quantifying the Preconditioning Effect of Adam
by: Das, Rudrajit, et al.
Published: (2024) -
The Art of Scaling Reinforcement Learning Compute for LLMs
by: Khatri, Devvrit, et al.
Published: (2025)