:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Choi, DongHyun, Spangher, Lucas, Hidey, Chris, Grabowski, Peter, Eskander, Ramy
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2504.02877
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers
by: Choi, Sehyun
Published: (2024)

Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
by: Roth, Nicholas, et al.
Published: (2025)

PatentEdits: Framing Patent Novelty as Textual Entailment
by: Lee, Ryan, et al.
Published: (2024)

Residual Stream Duality in Modern Transformer Architectures
by: Zhang, Yifan
Published: (2026)

LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference
by: Yang, Hantao, et al.
Published: (2025)

Understanding LLMs: A Comprehensive Overview from Training to Inference
by: Liu, Yiheng, et al.
Published: (2024)

Peri-LN: Revisiting Normalization Layer in the Transformer Architecture
by: Kim, Jeonghoon, et al.
Published: (2025)

Learning Action Conditions from Instructional Manuals for Instruction Understanding
by: Wu, Te-Lin, et al.
Published: (2022)

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't
by: Svete, Anej, et al.
Published: (2026)

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)

DiscoSum: Discourse-aware News Summarization
by: Spangher, Alexander, et al.
Published: (2025)

Improving LLM-as-a-Judge Inference with the Judgment Distribution
by: Wang, Victor, et al.
Published: (2025)

NewsEdits 2.0: Learning the Intentions Behind Updating News
by: Spangher, Alexander, et al.
Published: (2024)

Applications of the Transformer Architecture in AI-Assisted English Reading Comprehension
by: Li, Ping
Published: (2026)

Transforming Slot Schema Induction with Generative Dialogue State Inference
by: Finch, James D., et al.
Published: (2024)

Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures
by: Lucas, Evan, et al.
Published: (2024)

Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction
by: Kim, Jang-Hyun, et al.
Published: (2026)

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
by: Bhendawade, Nikhil, et al.
Published: (2025)

NewsHomepages: Homepage Layouts Capture Information Prioritization Decisions
by: Welsh, Ben, et al.
Published: (2024)

DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
by: Liu, Xiang, et al.
Published: (2025)

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
by: Antoun, Wissam, et al.
Published: (2025)

A Survey on LLM Inference-Time Self-Improvement
by: Dong, Xiangjue, et al.
Published: (2024)

FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG
by: Zhao, Xinping, et al.
Published: (2024)

Transparent Screening for LLM Inference and Training Impacts
by: Pachot, Arnault, et al.
Published: (2026)

Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
by: Wu, Te-Lin, et al.
Published: (2021)

ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs
by: Xu, Yuzhuang, et al.
Published: (2026)

Generalized Probabilistic Attention Mechanism in Transformers
by: Heo, DongNyeong, et al.
Published: (2024)

RoBIn: A Transformer-Based Model For Risk Of Bias Inference With Machine Reading Comprehension
by: Dias, Abel Corrêa, et al.
Published: (2024)

TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text
by: Arbel, Iftach, et al.
Published: (2024)

Explaining Mixtures of Sources in News Articles
by: Spangher, Alexander, et al.
Published: (2024)

Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference
by: Shyam, Vasu, et al.
Published: (2026)

Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models
by: Shi, Shuming, et al.
Published: (2024)

Horizon-LM: A RAM-Centric Architecture for LLM Training
by: Yuan, Zhengqing, et al.
Published: (2026)

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior
by: Huang, Zeyi, et al.
Published: (2026)

End-to-End Training for Back-Translation with Categorical Reparameterization Trick
by: Heo, DongNyeong, et al.
Published: (2022)

Diagnosing Training Inference Mismatch in LLM Reinforcement Learning
by: Zhong, Tianle, et al.
Published: (2026)

Revisiting Word Embeddings in the LLM Era
by: Mahajan, Yash, et al.
Published: (2025)

Are Large Language Models Capable of Generating Human-Level Narratives?
by: Tian, Yufei, et al.
Published: (2024)

Autoregressive Transformers for Disruption Prediction in Nuclear Fusion Plasmas
by: Spangher, Lucas, et al.
Published: (2023)

Revisiting Hierarchical Text Classification: Inference and Metrics
by: Plaud, Roman, et al.
Published: (2024)