Saved in:
| Main Authors: | Dutta, Abhinav, Krishnan, Sanjeev, Kwatra, Nipun, Ramjee, Ramachandran |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.09141 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference
by: Gond, Raja, et al.
Published: (2025)
by: Gond, Raja, et al.
Published: (2025)
Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference
by: Deshmukh, Dhruv, et al.
Published: (2025)
by: Deshmukh, Dhruv, et al.
Published: (2025)
Niyama : Breaking the Silos of LLM Inference Serving
by: Goel, Kanishk, et al.
Published: (2025)
by: Goel, Kanishk, et al.
Published: (2025)
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
by: Agrawal, Amey, et al.
Published: (2024)
by: Agrawal, Amey, et al.
Published: (2024)
Vidur: A Large-Scale Simulation Framework For LLM Inference
by: Agrawal, Amey, et al.
Published: (2024)
by: Agrawal, Amey, et al.
Published: (2024)
On Evaluating Performance of LLM Inference Serving Systems
by: Agrawal, Amey, et al.
Published: (2025)
by: Agrawal, Amey, et al.
Published: (2025)
Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems
by: Agrawal, Amey, et al.
Published: (2024)
by: Agrawal, Amey, et al.
Published: (2024)
ASTRA: Accurate and Scalable ANNS-based Training of Extreme Classifiers
by: Mehta, Sonu, et al.
Published: (2024)
by: Mehta, Sonu, et al.
Published: (2024)
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
by: Prabhu, Ramya, et al.
Published: (2024)
by: Prabhu, Ramya, et al.
Published: (2024)
Attention is All You Need Until You Need Retention
by: Yaslioglu, M. Murat
Published: (2025)
by: Yaslioglu, M. Murat
Published: (2025)
Context is All You Need
by: Delanois, Jean Erik, et al.
Published: (2026)
by: Delanois, Jean Erik, et al.
Published: (2026)
LLM-42: Enabling Determinism in LLM Inference with Verified Speculation
by: Gond, Raja, et al.
Published: (2026)
by: Gond, Raja, et al.
Published: (2026)
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
by: Tyukin, Georgy, et al.
Published: (2024)
by: Tyukin, Georgy, et al.
Published: (2024)
Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models
by: Ramjee, Sharan
Published: (2026)
by: Ramjee, Sharan
Published: (2026)
Top-$nσ$: Not All Logits Are You Need
by: Tang, Chenxia, et al.
Published: (2024)
by: Tang, Chenxia, et al.
Published: (2024)
Some Attention is All You Need for Retrieval
by: Michalak, Felix, et al.
Published: (2025)
by: Michalak, Felix, et al.
Published: (2025)
Half Search Space is All You Need
by: Rumiantsev, Pavel, et al.
Published: (2025)
by: Rumiantsev, Pavel, et al.
Published: (2025)
Multistep Inverse Is Not All You Need
by: Levine, Alexander, et al.
Published: (2024)
by: Levine, Alexander, et al.
Published: (2024)
Exploitation Is All You Need... for Exploration
by: Rentschler, Micah, et al.
Published: (2025)
by: Rentschler, Micah, et al.
Published: (2025)
Cooperation Is All You Need
by: Adeel, Ahsan, et al.
Published: (2023)
by: Adeel, Ahsan, et al.
Published: (2023)
Realizable Learning is All You Need
by: Hopkins, Max, et al.
Published: (2021)
by: Hopkins, Max, et al.
Published: (2021)
Support is All You Need for Certified VAE Training
by: Xu, Changming, et al.
Published: (2025)
by: Xu, Changming, et al.
Published: (2025)
MoE Lens -- An Expert Is All You Need
by: Chaudhari, Marmik, et al.
Published: (2026)
by: Chaudhari, Marmik, et al.
Published: (2026)
Fusion or Confusion? Multimodal Complexity Is Not All You Need
by: Rheude, Tillmann, et al.
Published: (2025)
by: Rheude, Tillmann, et al.
Published: (2025)
More Agents Is All You Need
by: Li, Junyou, et al.
Published: (2024)
by: Li, Junyou, et al.
Published: (2024)
All You Need Is Synthetic Task Augmentation
by: Godin, Guillaume
Published: (2025)
by: Godin, Guillaume
Published: (2025)
Element-wise Attention Is All You Need
by: Feng, Guoxin
Published: (2025)
by: Feng, Guoxin
Published: (2025)
CAMformer: Associative Memory is All You Need
by: Molom-Ochir, Tergel, et al.
Published: (2025)
by: Molom-Ochir, Tergel, et al.
Published: (2025)
Alignment with Preference Optimization Is All You Need for LLM Safety
by: Alami, Reda, et al.
Published: (2024)
by: Alami, Reda, et al.
Published: (2024)
Uni-LoRA: One Vector is All You Need
by: Li, Kaiyang, et al.
Published: (2025)
by: Li, Kaiyang, et al.
Published: (2025)
Tensor Product Attention Is All You Need
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
Confidence Is All You Need for MI Attacks
by: Sinha, Abhishek, et al.
Published: (2023)
by: Sinha, Abhishek, et al.
Published: (2023)
Attention Smoothing Is All You Need For Unlearning
by: Zade, Saleh Zare, et al.
Published: (2026)
by: Zade, Saleh Zare, et al.
Published: (2026)
Perturbation is All You Need for Extrapolating Language Models
by: Cen, Zetai, et al.
Published: (2026)
by: Cen, Zetai, et al.
Published: (2026)
Transduction is All You Need for Structured Data Workflows
by: Gliozzo, Alfio, et al.
Published: (2025)
by: Gliozzo, Alfio, et al.
Published: (2025)
Is Diversity All You Need for Scalable Robotic Manipulation?
by: Shi, Modi, et al.
Published: (2025)
by: Shi, Modi, et al.
Published: (2025)
Revisiting Social Welfare in Bandits: UCB is (Nearly) All You Need
by: Sarkar, Dhruv, et al.
Published: (2025)
by: Sarkar, Dhruv, et al.
Published: (2025)
HDL-GPT: High-Quality HDL is All You Need
by: Kumar, Bhuvnesh, et al.
Published: (2024)
by: Kumar, Bhuvnesh, et al.
Published: (2024)
Capabilities Ain't All You Need: Measuring Propensities in AI
by: Romero-Alvarado, Daniel, et al.
Published: (2026)
by: Romero-Alvarado, Daniel, et al.
Published: (2026)
Is Sequence Information All You Need for Bayesian Optimization of Antibodies?
by: Ober, Sebastian W., et al.
Published: (2025)
by: Ober, Sebastian W., et al.
Published: (2025)
Similar Items
-
TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference
by: Gond, Raja, et al.
Published: (2025) -
Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference
by: Deshmukh, Dhruv, et al.
Published: (2025) -
Niyama : Breaking the Silos of LLM Inference Serving
by: Goel, Kanishk, et al.
Published: (2025) -
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
by: Agrawal, Amey, et al.
Published: (2024) -
Vidur: A Large-Scale Simulation Framework For LLM Inference
by: Agrawal, Amey, et al.
Published: (2024)