:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Liu, Yi
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2605.03379
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
by: Feng, Steven, et al.
Published: (2024)

Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning
by: Wei, Lei, et al.
Published: (2026)

Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL
by: de Costa, Mishca, et al.
Published: (2025)

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
by: Behnam, Payman, et al.
Published: (2025)

VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference
by: Qi, Jasmine, et al.
Published: (2026)

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control
by: Li, Bolian, et al.
Published: (2026)

Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency
by: Husom, Erik Johannes, et al.
Published: (2025)

ToolACE: Winning the Points of LLM Function Calling
by: Liu, Weiwen, et al.
Published: (2024)

Two-stage LLM Fine-tuning with Less Specialization and More Generalization
by: Wang, Yihan, et al.
Published: (2022)

Lexical Hints of Accuracy in LLM Reasoning Chains
by: Vanhoyweghen, Arne, et al.
Published: (2025)

Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text
by: Rao, Zixin, et al.
Published: (2025)

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
by: Chen, Lingjiao, et al.
Published: (2024)

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
by: Deng, Yihe, et al.
Published: (2025)

Large Language Models as Agents in Two-Player Games
by: Liu, Yang, et al.
Published: (2024)

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
by: Zhou, Chenxi, et al.
Published: (2026)

LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
by: Zhang, Kangning, et al.
Published: (2025)

ToVo: Toxicity Taxonomy via Voting
by: Luong, Tinh Son, et al.
Published: (2024)

Incremental Sequence Labeling: A Tale of Two Shifts
by: Qiu, Shengjie, et al.
Published: (2024)

EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A
by: Ma, Shijian, et al.
Published: (2026)

Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
by: Zhang, Jiazheng, et al.
Published: (2025)

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference
by: Golowich, Noah, et al.
Published: (2026)

Universal Model Routing for Efficient LLM Inference
by: Jitkrittum, Wittawat, et al.
Published: (2025)

It Takes Two: Your GRPO Is Secretly DPO
by: Wu, Yihong, et al.
Published: (2025)

DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
by: Yang, Ning, et al.
Published: (2025)

Identifying Factual Inconsistencies in Summaries: Grounding LLM Inference via Task Taxonomy
by: Xu, Liyan, et al.
Published: (2024)

TinyAgent: Function Calling at the Edge
by: Erdogan, Lutfi Eren, et al.
Published: (2024)

SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
by: Refael, Yehonathan, et al.
Published: (2025)

Faster LLM Inference via Sequential Monte Carlo
by: Emara, Yahya, et al.
Published: (2026)

CHAI: Clustered Head Attention for Efficient LLM Inference
by: Agarwal, Saurabh, et al.
Published: (2024)

Cascade Speculative Drafting for Even Faster LLM Inference
by: Chen, Ziyi, et al.
Published: (2023)

Progressive Mixed-Precision Decoding for Efficient LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)

PoTPTQ: A Two-step Power-of-Two Post-training for LLMs
by: Wang, Xinyu, et al.
Published: (2025)

KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
by: Liu, Di, et al.
Published: (2024)

Non-Linear Inference Time Intervention: Improving LLM Truthfulness
by: Hoscilowicz, Jakub, et al.
Published: (2024)

Faster MoE LLM Inference for Extremely Large Models
by: Yang, Haoqi, et al.
Published: (2025)

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
by: Ma, Xuezhe, et al.
Published: (2024)

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
by: Fu, Yichao, et al.
Published: (2024)

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
by: Rodionov, Gleb, et al.
Published: (2025)

Inference time LLM alignment in single and multidomain preference spectrum
by: Shahriar, Sadat, et al.
Published: (2024)