Saved in:
| Main Authors: | Kim, Woojeong, Wang, Junxiong, Yan, Jing Nathan, Abdelfattah, Mohamed, Rush, Alexander M. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.08446 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics
by: Zhou, Jiawei, et al.
Published: (2024)
by: Zhou, Jiawei, et al.
Published: (2024)
ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
by: Akhauri, Yash, et al.
Published: (2024)
by: Akhauri, Yash, et al.
Published: (2024)
DARE: Diffusion Language Model Activation Reuse for Efficient Inference
by: Frumkin, Natalia, et al.
Published: (2026)
by: Frumkin, Natalia, et al.
Published: (2026)
MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
SEAL: Suite for Evaluating API-use of LLMs
by: Kim, Woojeong, et al.
Published: (2024)
by: Kim, Woojeong, et al.
Published: (2024)
DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues
by: Jang, Kyochul, et al.
Published: (2025)
by: Jang, Kyochul, et al.
Published: (2025)
Regression Language Models for Code
by: Akhauri, Yash, et al.
Published: (2025)
by: Akhauri, Yash, et al.
Published: (2025)
From Efficient Multimodal Models to World Models: A Survey
by: Mai, Xinji, et al.
Published: (2024)
by: Mai, Xinji, et al.
Published: (2024)
NITRO: LLM Inference on Intel Laptop NPUs
by: Fei, Anthony, et al.
Published: (2024)
by: Fei, Anthony, et al.
Published: (2024)
From Chat Logs to Collective Insights: Aggregative Question Answering
by: Zhang, Wentao, et al.
Published: (2025)
by: Zhang, Wentao, et al.
Published: (2025)
Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems
by: Yan, Jing Nathan, et al.
Published: (2025)
by: Yan, Jing Nathan, et al.
Published: (2025)
Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference
by: Chen, Chun-Ting, et al.
Published: (2025)
by: Chen, Chun-Ting, et al.
Published: (2025)
Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models
by: Li, Miao, et al.
Published: (2026)
by: Li, Miao, et al.
Published: (2026)
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
by: Zhao, Yilong, et al.
Published: (2025)
by: Zhao, Yilong, et al.
Published: (2025)
Encodings for Prediction-based Neural Architecture Search
by: Akhauri, Yash, et al.
Published: (2024)
by: Akhauri, Yash, et al.
Published: (2024)
Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models
by: Chen, Yan, et al.
Published: (2025)
by: Chen, Yan, et al.
Published: (2025)
GOPO: Policy Optimization using Ranked Rewards
by: Choi, Kyuseong, et al.
Published: (2026)
by: Choi, Kyuseong, et al.
Published: (2026)
Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length?
by: Lee, Celine, et al.
Published: (2025)
by: Lee, Celine, et al.
Published: (2025)
Approximating Language Model Training Data from Weights
by: Morris, John X., et al.
Published: (2025)
by: Morris, John X., et al.
Published: (2025)
A Two-Stage Proactive Dialogue Generator for Efficient Clinical Information Collection Using Large Language Model
by: Li, Xueshen, et al.
Published: (2024)
by: Li, Xueshen, et al.
Published: (2024)
Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models
by: Hunter, Rosco, et al.
Published: (2023)
by: Hunter, Rosco, et al.
Published: (2023)
Contextual Document Embeddings
by: Morris, John X., et al.
Published: (2024)
by: Morris, John X., et al.
Published: (2024)
CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
Sovereign Context Protocol: An Open Attribution Layer for Human-Generated Content in the Age of Large Language Models
by: Panchigar, Praneel, et al.
Published: (2026)
by: Panchigar, Praneel, et al.
Published: (2026)
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
by: Arriola, Marianne, et al.
Published: (2025)
by: Arriola, Marianne, et al.
Published: (2025)
Plato: Plan to Efficiently Decode for Large Language Model Inference
by: Jin, Shuowei, et al.
Published: (2024)
by: Jin, Shuowei, et al.
Published: (2024)
Adaptive Draft-Verification for Efficient Large Language Model Decoding
by: Liu, Xukun, et al.
Published: (2024)
by: Liu, Xukun, et al.
Published: (2024)
Efficient Decoding Methods for Language Models on Encrypted Data
by: Avitan, Matan, et al.
Published: (2025)
by: Avitan, Matan, et al.
Published: (2025)
Scaling Data-Constrained Language Models
by: Muennighoff, Niklas, et al.
Published: (2023)
by: Muennighoff, Niklas, et al.
Published: (2023)
Foundations of Top-$k$ Decoding For Language Models
by: Noarov, Georgy, et al.
Published: (2025)
by: Noarov, Georgy, et al.
Published: (2025)
Analyzing Patient Daily Movement Behavior Dynamics Using Two-Stage Encoding Model
by: Cui, Jin, et al.
Published: (2025)
by: Cui, Jin, et al.
Published: (2025)
Simple and Effective Masked Diffusion Language Models
by: Sahoo, Subham Sekhar, et al.
Published: (2024)
by: Sahoo, Subham Sekhar, et al.
Published: (2024)
Great Memory, Shallow Reasoning: Limits of $k$NN-LMs
by: Geng, Shangyi, et al.
Published: (2024)
by: Geng, Shangyi, et al.
Published: (2024)
Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models
by: Lee, Sanghyun, et al.
Published: (2025)
by: Lee, Sanghyun, et al.
Published: (2025)
Compute-Constrained Data Selection
by: Yin, Junjie Oscar, et al.
Published: (2024)
by: Yin, Junjie Oscar, et al.
Published: (2024)
Semantic Compression of 3D Objects for Open and Collaborative Virtual Worlds
by: Dotzel, Jordan, et al.
Published: (2025)
by: Dotzel, Jordan, et al.
Published: (2025)
Introspective Diffusion Language Models
by: Yu, Yifan, et al.
Published: (2026)
by: Yu, Yifan, et al.
Published: (2026)
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
The End of Manual Decoding: Towards Truly End-to-End Language Models
by: Wang, Zhichao, et al.
Published: (2025)
by: Wang, Zhichao, et al.
Published: (2025)
Similar Items
-
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
by: Wang, Junxiong, et al.
Published: (2024) -
NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics
by: Zhou, Jiawei, et al.
Published: (2024) -
ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
by: Akhauri, Yash, et al.
Published: (2024) -
DARE: Diffusion Language Model Activation Reuse for Efficient Inference
by: Frumkin, Natalia, et al.
Published: (2026) -
MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024)