Saved in:
| Main Authors: | Lin, Kevin, Snell, Charlie, Wang, Yu, Packer, Charles, Wooders, Sarah, Stoica, Ion, Gonzalez, Joseph E. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.13171 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MemGPT: Towards LLMs as Operating Systems
by: Packer, Charles, et al.
Published: (2023)
by: Packer, Charles, et al.
Published: (2023)
Post-Training Sparse Attention with Double Sparsity
by: Yang, Shuo, et al.
Published: (2024)
by: Yang, Shuo, et al.
Published: (2024)
OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)
by: Cui, Justin, et al.
Published: (2024)
RAFT: Adapting Language Model to Domain Specific RAG
by: Zhang, Tianjun, et al.
Published: (2024)
by: Zhang, Tianjun, et al.
Published: (2024)
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
by: Cemri, Mert, et al.
Published: (2025)
by: Cemri, Mert, et al.
Published: (2025)
Speculative Decoding: Performance or Illusion?
by: Liu, Xiaoxuan, et al.
Published: (2025)
by: Liu, Xiaoxuan, et al.
Published: (2025)
Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
by: Chen, Lingjiao, et al.
Published: (2024)
by: Chen, Lingjiao, et al.
Published: (2024)
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
by: Zheng, Lianmin, et al.
Published: (2023)
by: Zheng, Lianmin, et al.
Published: (2023)
GameArena: Evaluating LLM Reasoning through Live Computer Games
by: Hu, Lanxiang, et al.
Published: (2024)
by: Hu, Lanxiang, et al.
Published: (2024)
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
by: Li, Tianle, et al.
Published: (2024)
by: Li, Tianle, et al.
Published: (2024)
RouteLLM: Learning to Route LLMs with Preference Data
by: Ong, Isaac, et al.
Published: (2024)
by: Ong, Isaac, et al.
Published: (2024)
Reasoning Models Can Be Effective Without Thinking
by: Ma, Wenjie, et al.
Published: (2025)
by: Ma, Wenjie, et al.
Published: (2025)
OptScale: Probabilistic Optimality for Inference-time Scaling
by: Wang, Youkang, et al.
Published: (2025)
by: Wang, Youkang, et al.
Published: (2025)
BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation
by: Zhu, Alan, et al.
Published: (2025)
by: Zhu, Alan, et al.
Published: (2025)
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents
by: Li, Xirui, et al.
Published: (2026)
by: Li, Xirui, et al.
Published: (2026)
How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)
by: Frick, Evan, et al.
Published: (2024)
S*: Test Time Scaling for Code Generation
by: Li, Dacheng, et al.
Published: (2025)
by: Li, Dacheng, et al.
Published: (2025)
HashAttention: Semantic Sparsity for Faster Inference
by: Desai, Aditya, et al.
Published: (2024)
by: Desai, Aditya, et al.
Published: (2024)
T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models
by: Kang, Minki, et al.
Published: (2025)
by: Kang, Minki, et al.
Published: (2025)
K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
by: Cao, Shiyi, et al.
Published: (2026)
by: Cao, Shiyi, et al.
Published: (2026)
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
by: Wang, Fali, et al.
Published: (2025)
by: Wang, Fali, et al.
Published: (2025)
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
by: Chiang, Wei-Lin, et al.
Published: (2024)
by: Chiang, Wei-Lin, et al.
Published: (2024)
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
by: Paliotta, Daniele, et al.
Published: (2025)
by: Paliotta, Daniele, et al.
Published: (2025)
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
by: Patil, Shishir G., et al.
Published: (2024)
by: Patil, Shishir G., et al.
Published: (2024)
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
by: Xiao, Yang, et al.
Published: (2025)
by: Xiao, Yang, et al.
Published: (2025)
SGLang: Efficient Execution of Structured Language Model Programs
by: Zheng, Lianmin, et al.
Published: (2023)
by: Zheng, Lianmin, et al.
Published: (2023)
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
by: You, Kaichao, et al.
Published: (2024)
by: You, Kaichao, et al.
Published: (2024)
Test-time Prompt Intervention
by: Yang, Chenxu, et al.
Published: (2025)
by: Yang, Chenxu, et al.
Published: (2025)
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
by: Wen, Hao, et al.
Published: (2025)
by: Wen, Hao, et al.
Published: (2025)
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
by: Snell, Charlie, et al.
Published: (2024)
by: Snell, Charlie, et al.
Published: (2024)
Combee: Scaling Prompt Learning for Self-Improving Language Model Agents
by: Li, Hanchen, et al.
Published: (2026)
by: Li, Hanchen, et al.
Published: (2026)
Scaling LLM Inference with Optimized Sample Compute Allocation
by: Zhang, Kexun, et al.
Published: (2024)
by: Zhang, Kexun, et al.
Published: (2024)
DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis
by: Patel, Liana, et al.
Published: (2025)
by: Patel, Liana, et al.
Published: (2025)
UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling
by: Huang, Kaiyu, et al.
Published: (2026)
by: Huang, Kaiyu, et al.
Published: (2026)
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
by: Yang, Wenkai, et al.
Published: (2025)
by: Yang, Wenkai, et al.
Published: (2025)
Stylus: Automatic Adapter Selection for Diffusion Models
by: Luo, Michael, et al.
Published: (2024)
by: Luo, Michael, et al.
Published: (2024)
Inverse Scaling in Test-Time Compute
by: Gema, Aryo Pradipta, et al.
Published: (2025)
by: Gema, Aryo Pradipta, et al.
Published: (2025)
Trust but Verify! A Survey on Verification Design for Test-time Scaling
by: Venktesh, V, et al.
Published: (2025)
by: Venktesh, V, et al.
Published: (2025)
Budget-aware Test-time Scaling via Discriminative Verification
by: Montgomery, Kyle, et al.
Published: (2025)
by: Montgomery, Kyle, et al.
Published: (2025)
SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling
by: Xiao, Yang, et al.
Published: (2025)
by: Xiao, Yang, et al.
Published: (2025)
Similar Items
-
MemGPT: Towards LLMs as Operating Systems
by: Packer, Charles, et al.
Published: (2023) -
Post-Training Sparse Attention with Double Sparsity
by: Yang, Shuo, et al.
Published: (2024) -
OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024) -
RAFT: Adapting Language Model to Domain Specific RAG
by: Zhang, Tianjun, et al.
Published: (2024) -
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
by: Cemri, Mert, et al.
Published: (2025)