Saved in:
| Main Author: | Liu, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.03379 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
by: Feng, Steven, et al.
Published: (2024)
by: Feng, Steven, et al.
Published: (2024)
Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning
by: Wei, Lei, et al.
Published: (2026)
by: Wei, Lei, et al.
Published: (2026)
Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL
by: de Costa, Mishca, et al.
Published: (2025)
by: de Costa, Mishca, et al.
Published: (2025)
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
by: Behnam, Payman, et al.
Published: (2025)
by: Behnam, Payman, et al.
Published: (2025)
VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference
by: Qi, Jasmine, et al.
Published: (2026)
by: Qi, Jasmine, et al.
Published: (2026)
Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control
by: Li, Bolian, et al.
Published: (2026)
by: Li, Bolian, et al.
Published: (2026)
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency
by: Husom, Erik Johannes, et al.
Published: (2025)
by: Husom, Erik Johannes, et al.
Published: (2025)
ToolACE: Winning the Points of LLM Function Calling
by: Liu, Weiwen, et al.
Published: (2024)
by: Liu, Weiwen, et al.
Published: (2024)
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
by: Wang, Yihan, et al.
Published: (2022)
by: Wang, Yihan, et al.
Published: (2022)
Lexical Hints of Accuracy in LLM Reasoning Chains
by: Vanhoyweghen, Arne, et al.
Published: (2025)
by: Vanhoyweghen, Arne, et al.
Published: (2025)
Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text
by: Rao, Zixin, et al.
Published: (2025)
by: Rao, Zixin, et al.
Published: (2025)
Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
by: Chen, Lingjiao, et al.
Published: (2024)
by: Chen, Lingjiao, et al.
Published: (2024)
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
by: Deng, Yihe, et al.
Published: (2025)
by: Deng, Yihe, et al.
Published: (2025)
Large Language Models as Agents in Two-Player Games
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
by: Zhou, Chenxi, et al.
Published: (2026)
by: Zhou, Chenxi, et al.
Published: (2026)
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
by: Zhang, Kangning, et al.
Published: (2025)
by: Zhang, Kangning, et al.
Published: (2025)
ToVo: Toxicity Taxonomy via Voting
by: Luong, Tinh Son, et al.
Published: (2024)
by: Luong, Tinh Son, et al.
Published: (2024)
Incremental Sequence Labeling: A Tale of Two Shifts
by: Qiu, Shengjie, et al.
Published: (2024)
by: Qiu, Shengjie, et al.
Published: (2024)
EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A
by: Ma, Shijian, et al.
Published: (2026)
by: Ma, Shijian, et al.
Published: (2026)
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
by: Zhang, Jiazheng, et al.
Published: (2025)
by: Zhang, Jiazheng, et al.
Published: (2025)
Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference
by: Golowich, Noah, et al.
Published: (2026)
by: Golowich, Noah, et al.
Published: (2026)
Universal Model Routing for Efficient LLM Inference
by: Jitkrittum, Wittawat, et al.
Published: (2025)
by: Jitkrittum, Wittawat, et al.
Published: (2025)
It Takes Two: Your GRPO Is Secretly DPO
by: Wu, Yihong, et al.
Published: (2025)
by: Wu, Yihong, et al.
Published: (2025)
DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
by: Yang, Ning, et al.
Published: (2025)
by: Yang, Ning, et al.
Published: (2025)
Identifying Factual Inconsistencies in Summaries: Grounding LLM Inference via Task Taxonomy
by: Xu, Liyan, et al.
Published: (2024)
by: Xu, Liyan, et al.
Published: (2024)
TinyAgent: Function Calling at the Edge
by: Erdogan, Lutfi Eren, et al.
Published: (2024)
by: Erdogan, Lutfi Eren, et al.
Published: (2024)
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
by: Refael, Yehonathan, et al.
Published: (2025)
by: Refael, Yehonathan, et al.
Published: (2025)
Faster LLM Inference via Sequential Monte Carlo
by: Emara, Yahya, et al.
Published: (2026)
by: Emara, Yahya, et al.
Published: (2026)
CHAI: Clustered Head Attention for Efficient LLM Inference
by: Agarwal, Saurabh, et al.
Published: (2024)
by: Agarwal, Saurabh, et al.
Published: (2024)
Cascade Speculative Drafting for Even Faster LLM Inference
by: Chen, Ziyi, et al.
Published: (2023)
by: Chen, Ziyi, et al.
Published: (2023)
Progressive Mixed-Precision Decoding for Efficient LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)
by: Chen, Hao Mark, et al.
Published: (2024)
PoTPTQ: A Two-step Power-of-Two Post-training for LLMs
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)
by: Tian, Yuxuan, et al.
Published: (2025)
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
by: Liu, Di, et al.
Published: (2024)
by: Liu, Di, et al.
Published: (2024)
Non-Linear Inference Time Intervention: Improving LLM Truthfulness
by: Hoscilowicz, Jakub, et al.
Published: (2024)
by: Hoscilowicz, Jakub, et al.
Published: (2024)
Faster MoE LLM Inference for Extremely Large Models
by: Yang, Haoqi, et al.
Published: (2025)
by: Yang, Haoqi, et al.
Published: (2025)
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
by: Ma, Xuezhe, et al.
Published: (2024)
by: Ma, Xuezhe, et al.
Published: (2024)
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
by: Fu, Yichao, et al.
Published: (2024)
by: Fu, Yichao, et al.
Published: (2024)
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
by: Rodionov, Gleb, et al.
Published: (2025)
by: Rodionov, Gleb, et al.
Published: (2025)
Inference time LLM alignment in single and multidomain preference spectrum
by: Shahriar, Sadat, et al.
Published: (2024)
by: Shahriar, Sadat, et al.
Published: (2024)
Similar Items
-
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
by: Feng, Steven, et al.
Published: (2024) -
Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning
by: Wei, Lei, et al.
Published: (2026) -
Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL
by: de Costa, Mishca, et al.
Published: (2025) -
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
by: Behnam, Payman, et al.
Published: (2025) -
VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference
by: Qi, Jasmine, et al.
Published: (2026)