Saved in:
| Main Authors: | Nie, Lunyiu, Ding, Zhimin, Hu, Erdong, Jermaine, Christopher, Chaudhuri, Swarat |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.04513 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Resource-efficient Inference with Foundation Model Programs
by: Nie, Lunyiu, et al.
Published: (2025)
by: Nie, Lunyiu, et al.
Published: (2025)
Batched Low-Rank Adaptation of Foundation Models
by: Wen, Yeming, et al.
Published: (2023)
by: Wen, Yeming, et al.
Published: (2023)
Learning Quantitative Automata Modulo Theories
by: Hsiung, Eric, et al.
Published: (2024)
by: Hsiung, Eric, et al.
Published: (2024)
When Parallelism Pays Off: Cohesion-Aware Task Partitioning for Multi-Agent Coding
by: Yang, Xu, et al.
Published: (2026)
by: Yang, Xu, et al.
Published: (2026)
Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation
by: Jain, Abhinav, et al.
Published: (2024)
by: Jain, Abhinav, et al.
Published: (2024)
Efficient Tree-Structured Deep Research with Adaptive Resource Allocation
by: Nie, Lunyiu, et al.
Published: (2025)
by: Nie, Lunyiu, et al.
Published: (2025)
An In-Context Learning Agent for Formal Theorem-Proving
by: Thakur, Amitayush, et al.
Published: (2023)
by: Thakur, Amitayush, et al.
Published: (2023)
ProofWala: A Framework for Multilingual Proof Data Synthesis and Theorem-Proving
by: Thakur, Amitayush, et al.
Published: (2025)
by: Thakur, Amitayush, et al.
Published: (2025)
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
by: Tsoukalas, George, et al.
Published: (2024)
by: Tsoukalas, George, et al.
Published: (2024)
Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality
by: Hu, Zhimin, et al.
Published: (2026)
by: Hu, Zhimin, et al.
Published: (2026)
RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models
by: Jain, Abhinav, et al.
Published: (2024)
by: Jain, Abhinav, et al.
Published: (2024)
Why Code, Why Now: An Information-Theoretic Perspective on the Limits of Machine Learning
by: Zhao, Zhimin
Published: (2026)
by: Zhao, Zhimin
Published: (2026)
Automata Learning from Preference and Equivalence Queries
by: Hsiung, Eric, et al.
Published: (2023)
by: Hsiung, Eric, et al.
Published: (2023)
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
by: Wen, Yeming, et al.
Published: (2024)
by: Wen, Yeming, et al.
Published: (2024)
Cascade Speculative Drafting for Even Faster LLM Inference
by: Chen, Ziyi, et al.
Published: (2023)
by: Chen, Ziyi, et al.
Published: (2023)
Grounding Data Science Code Generation with Input-Output Specifications
by: Wen, Yeming, et al.
Published: (2024)
by: Wen, Yeming, et al.
Published: (2024)
ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning
by: Liu, Yongkang, et al.
Published: (2026)
by: Liu, Yongkang, et al.
Published: (2026)
Learning-Time Encoding Shapes Unlearning in LLMs
by: Wu, Ruihan, et al.
Published: (2025)
by: Wu, Ruihan, et al.
Published: (2025)
Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
by: Hong, Yinrong, et al.
Published: (2025)
by: Hong, Yinrong, et al.
Published: (2025)
GRASP: A Rehearsal Policy for Efficient Online Continual Learning
by: Harun, Md Yousuf, et al.
Published: (2023)
by: Harun, Md Yousuf, et al.
Published: (2023)
Cascade Reward Sampling for Efficient Decoding-Time Alignment
by: Li, Bolian, et al.
Published: (2024)
by: Li, Bolian, et al.
Published: (2024)
Star Attention: Efficient LLM Inference over Long Sequences
by: Acharya, Shantanu, et al.
Published: (2024)
by: Acharya, Shantanu, et al.
Published: (2024)
Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation
by: Rabatin, Rastislav, et al.
Published: (2024)
by: Rabatin, Rastislav, et al.
Published: (2024)
Symbolic Regression with a Learned Concept Library
by: Grayeli, Arya, et al.
Published: (2024)
by: Grayeli, Arya, et al.
Published: (2024)
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
A Probabilistic Framework for Modular Continual Learning
by: Valkov, Lazar, et al.
Published: (2023)
by: Valkov, Lazar, et al.
Published: (2023)
CLEVER: A Curated Benchmark for Formally Verified Code Generation
by: Thakur, Amitayush, et al.
Published: (2025)
by: Thakur, Amitayush, et al.
Published: (2025)
Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting
by: Hu, Michael Y., et al.
Published: (2025)
by: Hu, Michael Y., et al.
Published: (2025)
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
by: Deng, Hexuan, et al.
Published: (2025)
by: Deng, Hexuan, et al.
Published: (2025)
PHONOS: PHOnetic Neutralization for Online Streaming Applications
by: Quamer, Waris, et al.
Published: (2026)
by: Quamer, Waris, et al.
Published: (2026)
Efficient Learned Data Compression via Dual-Stream Feature Decoupling
by: Ma, Huidong, et al.
Published: (2026)
by: Ma, Huidong, et al.
Published: (2026)
Speculative Streaming: Fast LLM Inference without Auxiliary Models
by: Bhendawade, Nikhil, et al.
Published: (2024)
by: Bhendawade, Nikhil, et al.
Published: (2024)
OPTune: Efficient Online Preference Tuning
by: Chen, Lichang, et al.
Published: (2024)
by: Chen, Lichang, et al.
Published: (2024)
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
by: Wang, Boxin, et al.
Published: (2025)
by: Wang, Boxin, et al.
Published: (2025)
Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern
by: Tang, Hongyin, et al.
Published: (2024)
by: Tang, Hongyin, et al.
Published: (2024)
DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
by: Yao, Xinyu, et al.
Published: (2025)
by: Yao, Xinyu, et al.
Published: (2025)
zip2zip: Inference-Time Adaptive Tokenization via Online Compression
by: Geng, Saibo, et al.
Published: (2025)
by: Geng, Saibo, et al.
Published: (2025)
Self-Evolving Visual Concept Library using Vision-Language Critics
by: Sehgal, Atharva, et al.
Published: (2025)
by: Sehgal, Atharva, et al.
Published: (2025)
Universal Model Routing for Efficient LLM Inference
by: Jitkrittum, Wittawat, et al.
Published: (2025)
by: Jitkrittum, Wittawat, et al.
Published: (2025)
Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch
by: Zhang, Ziyang, et al.
Published: (2025)
by: Zhang, Ziyang, et al.
Published: (2025)
Similar Items
-
Resource-efficient Inference with Foundation Model Programs
by: Nie, Lunyiu, et al.
Published: (2025) -
Batched Low-Rank Adaptation of Foundation Models
by: Wen, Yeming, et al.
Published: (2023) -
Learning Quantitative Automata Modulo Theories
by: Hsiung, Eric, et al.
Published: (2024) -
When Parallelism Pays Off: Cohesion-Aware Task Partitioning for Multi-Agent Coding
by: Yang, Xu, et al.
Published: (2026) -
Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation
by: Jain, Abhinav, et al.
Published: (2024)