Saved in:
| Main Authors: | Choi, DongHyun, Spangher, Lucas, Hidey, Chris, Grabowski, Peter, Eskander, Ramy |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.02877 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers
by: Choi, Sehyun
Published: (2024)
by: Choi, Sehyun
Published: (2024)
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
by: Roth, Nicholas, et al.
Published: (2025)
by: Roth, Nicholas, et al.
Published: (2025)
PatentEdits: Framing Patent Novelty as Textual Entailment
by: Lee, Ryan, et al.
Published: (2024)
by: Lee, Ryan, et al.
Published: (2024)
Residual Stream Duality in Modern Transformer Architectures
by: Zhang, Yifan
Published: (2026)
by: Zhang, Yifan
Published: (2026)
LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference
by: Yang, Hantao, et al.
Published: (2025)
by: Yang, Hantao, et al.
Published: (2025)
Understanding LLMs: A Comprehensive Overview from Training to Inference
by: Liu, Yiheng, et al.
Published: (2024)
by: Liu, Yiheng, et al.
Published: (2024)
Peri-LN: Revisiting Normalization Layer in the Transformer Architecture
by: Kim, Jeonghoon, et al.
Published: (2025)
by: Kim, Jeonghoon, et al.
Published: (2025)
Learning Action Conditions from Instructional Manuals for Instruction Understanding
by: Wu, Te-Lin, et al.
Published: (2022)
by: Wu, Te-Lin, et al.
Published: (2022)
Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't
by: Svete, Anej, et al.
Published: (2026)
by: Svete, Anej, et al.
Published: (2026)
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)
by: Cai, Zefan, et al.
Published: (2024)
DiscoSum: Discourse-aware News Summarization
by: Spangher, Alexander, et al.
Published: (2025)
by: Spangher, Alexander, et al.
Published: (2025)
Improving LLM-as-a-Judge Inference with the Judgment Distribution
by: Wang, Victor, et al.
Published: (2025)
by: Wang, Victor, et al.
Published: (2025)
NewsEdits 2.0: Learning the Intentions Behind Updating News
by: Spangher, Alexander, et al.
Published: (2024)
by: Spangher, Alexander, et al.
Published: (2024)
Applications of the Transformer Architecture in AI-Assisted English Reading Comprehension
by: Li, Ping
Published: (2026)
by: Li, Ping
Published: (2026)
Transforming Slot Schema Induction with Generative Dialogue State Inference
by: Finch, James D., et al.
Published: (2024)
by: Finch, James D., et al.
Published: (2024)
Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures
by: Lucas, Evan, et al.
Published: (2024)
by: Lucas, Evan, et al.
Published: (2024)
Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction
by: Kim, Jang-Hyun, et al.
Published: (2026)
by: Kim, Jang-Hyun, et al.
Published: (2026)
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
by: Bhendawade, Nikhil, et al.
Published: (2025)
by: Bhendawade, Nikhil, et al.
Published: (2025)
NewsHomepages: Homepage Layouts Capture Information Prioritization Decisions
by: Welsh, Ben, et al.
Published: (2024)
by: Welsh, Ben, et al.
Published: (2024)
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
by: Liu, Xiang, et al.
Published: (2025)
by: Liu, Xiang, et al.
Published: (2025)
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
by: Antoun, Wissam, et al.
Published: (2025)
by: Antoun, Wissam, et al.
Published: (2025)
A Survey on LLM Inference-Time Self-Improvement
by: Dong, Xiangjue, et al.
Published: (2024)
by: Dong, Xiangjue, et al.
Published: (2024)
FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG
by: Zhao, Xinping, et al.
Published: (2024)
by: Zhao, Xinping, et al.
Published: (2024)
Transparent Screening for LLM Inference and Training Impacts
by: Pachot, Arnault, et al.
Published: (2026)
by: Pachot, Arnault, et al.
Published: (2026)
Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
by: Wu, Te-Lin, et al.
Published: (2021)
by: Wu, Te-Lin, et al.
Published: (2021)
ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs
by: Xu, Yuzhuang, et al.
Published: (2026)
by: Xu, Yuzhuang, et al.
Published: (2026)
Generalized Probabilistic Attention Mechanism in Transformers
by: Heo, DongNyeong, et al.
Published: (2024)
by: Heo, DongNyeong, et al.
Published: (2024)
RoBIn: A Transformer-Based Model For Risk Of Bias Inference With Machine Reading Comprehension
by: Dias, Abel Corrêa, et al.
Published: (2024)
by: Dias, Abel Corrêa, et al.
Published: (2024)
TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text
by: Arbel, Iftach, et al.
Published: (2024)
by: Arbel, Iftach, et al.
Published: (2024)
Explaining Mixtures of Sources in News Articles
by: Spangher, Alexander, et al.
Published: (2024)
by: Spangher, Alexander, et al.
Published: (2024)
Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference
by: Shyam, Vasu, et al.
Published: (2026)
by: Shyam, Vasu, et al.
Published: (2026)
Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models
by: Shi, Shuming, et al.
Published: (2024)
by: Shi, Shuming, et al.
Published: (2024)
Horizon-LM: A RAM-Centric Architecture for LLM Training
by: Yuan, Zhengqing, et al.
Published: (2026)
by: Yuan, Zhengqing, et al.
Published: (2026)
Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior
by: Huang, Zeyi, et al.
Published: (2026)
by: Huang, Zeyi, et al.
Published: (2026)
End-to-End Training for Back-Translation with Categorical Reparameterization Trick
by: Heo, DongNyeong, et al.
Published: (2022)
by: Heo, DongNyeong, et al.
Published: (2022)
Diagnosing Training Inference Mismatch in LLM Reinforcement Learning
by: Zhong, Tianle, et al.
Published: (2026)
by: Zhong, Tianle, et al.
Published: (2026)
Revisiting Word Embeddings in the LLM Era
by: Mahajan, Yash, et al.
Published: (2025)
by: Mahajan, Yash, et al.
Published: (2025)
Are Large Language Models Capable of Generating Human-Level Narratives?
by: Tian, Yufei, et al.
Published: (2024)
by: Tian, Yufei, et al.
Published: (2024)
Autoregressive Transformers for Disruption Prediction in Nuclear Fusion Plasmas
by: Spangher, Lucas, et al.
Published: (2023)
by: Spangher, Lucas, et al.
Published: (2023)
Revisiting Hierarchical Text Classification: Inference and Metrics
by: Plaud, Roman, et al.
Published: (2024)
by: Plaud, Roman, et al.
Published: (2024)
Similar Items
-
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers
by: Choi, Sehyun
Published: (2024) -
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
by: Roth, Nicholas, et al.
Published: (2025) -
PatentEdits: Framing Patent Novelty as Textual Entailment
by: Lee, Ryan, et al.
Published: (2024) -
Residual Stream Duality in Modern Transformer Architectures
by: Zhang, Yifan
Published: (2026) -
LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference
by: Yang, Hantao, et al.
Published: (2025)