:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nie, Lunyiu, Ding, Zhimin, Hu, Erdong, Jermaine, Christopher, Chaudhuri, Swarat
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2402.04513
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Resource-efficient Inference with Foundation Model Programs
by: Nie, Lunyiu, et al.
Published: (2025)

Batched Low-Rank Adaptation of Foundation Models
by: Wen, Yeming, et al.
Published: (2023)

Learning Quantitative Automata Modulo Theories
by: Hsiung, Eric, et al.
Published: (2024)

When Parallelism Pays Off: Cohesion-Aware Task Partitioning for Multi-Agent Coding
by: Yang, Xu, et al.
Published: (2026)

Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation
by: Jain, Abhinav, et al.
Published: (2024)

Efficient Tree-Structured Deep Research with Adaptive Resource Allocation
by: Nie, Lunyiu, et al.
Published: (2025)

An In-Context Learning Agent for Formal Theorem-Proving
by: Thakur, Amitayush, et al.
Published: (2023)

ProofWala: A Framework for Multilingual Proof Data Synthesis and Theorem-Proving
by: Thakur, Amitayush, et al.
Published: (2025)

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
by: Tsoukalas, George, et al.
Published: (2024)

Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality
by: Hu, Zhimin, et al.
Published: (2026)

RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models
by: Jain, Abhinav, et al.
Published: (2024)

Why Code, Why Now: An Information-Theoretic Perspective on the Limits of Machine Learning
by: Zhao, Zhimin
Published: (2026)

Automata Learning from Preference and Equivalence Queries
by: Hsiung, Eric, et al.
Published: (2023)

Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
by: Wen, Yeming, et al.
Published: (2024)

Cascade Speculative Drafting for Even Faster LLM Inference
by: Chen, Ziyi, et al.
Published: (2023)

Grounding Data Science Code Generation with Input-Output Specifications
by: Wen, Yeming, et al.
Published: (2024)

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning
by: Liu, Yongkang, et al.
Published: (2026)

Learning-Time Encoding Shapes Unlearning in LLMs
by: Wu, Ruihan, et al.
Published: (2025)

Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
by: Hong, Yinrong, et al.
Published: (2025)

GRASP: A Rehearsal Policy for Efficient Online Continual Learning
by: Harun, Md Yousuf, et al.
Published: (2023)

Cascade Reward Sampling for Efficient Decoding-Time Alignment
by: Li, Bolian, et al.
Published: (2024)

Star Attention: Efficient LLM Inference over Long Sequences
by: Acharya, Shantanu, et al.
Published: (2024)

Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation
by: Rabatin, Rastislav, et al.
Published: (2024)

Symbolic Regression with a Learned Concept Library
by: Grayeli, Arya, et al.
Published: (2024)

Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning
by: Zhang, Xuechen, et al.
Published: (2024)

A Probabilistic Framework for Modular Continual Learning
by: Valkov, Lazar, et al.
Published: (2023)

CLEVER: A Curated Benchmark for Formally Verified Code Generation
by: Thakur, Amitayush, et al.
Published: (2025)

Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting
by: Hu, Michael Y., et al.
Published: (2025)

REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
by: Deng, Hexuan, et al.
Published: (2025)

PHONOS: PHOnetic Neutralization for Online Streaming Applications
by: Quamer, Waris, et al.
Published: (2026)

Efficient Learned Data Compression via Dual-Stream Feature Decoupling
by: Ma, Huidong, et al.
Published: (2026)

Speculative Streaming: Fast LLM Inference without Auxiliary Models
by: Bhendawade, Nikhil, et al.
Published: (2024)

OPTune: Efficient Online Preference Tuning
by: Chen, Lichang, et al.
Published: (2024)

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
by: Wang, Boxin, et al.
Published: (2025)

Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern
by: Tang, Hongyin, et al.
Published: (2024)

DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
by: Yao, Xinyu, et al.
Published: (2025)

zip2zip: Inference-Time Adaptive Tokenization via Online Compression
by: Geng, Saibo, et al.
Published: (2025)

Self-Evolving Visual Concept Library using Vision-Language Critics
by: Sehgal, Atharva, et al.
Published: (2025)

Universal Model Routing for Efficient LLM Inference
by: Jitkrittum, Wittawat, et al.
Published: (2025)

Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch
by: Zhang, Ziyang, et al.
Published: (2025)