:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lei, Jiayin, Ma, Ming, Duan, Yunxi, Li, Chenxi, Yang, Tianming
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.12165
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

QAQ: Quality Adaptive Quantization for LLM KV Cache
by: Dong, Shichen, et al.
Published: (2024)

SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning
by: Wang, Peidong, et al.
Published: (2026)

Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
by: Chen, Ruibo, et al.
Published: (2024)

CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom
by: Li, Yisen, et al.
Published: (2025)

Label Words as Local Task Vectors in In-Context Learning
by: Zheng, Bowen, et al.
Published: (2024)

SimLens for Early Exit in Large Language Models: Eliciting Accurate Latent Predictions with One More Token
by: Ma, Ming, et al.
Published: (2025)

Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification
by: Xing, Wenjing, et al.
Published: (2025)

CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
by: Liu, Yilun, et al.
Published: (2023)

Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
by: Ge, Yuan, et al.
Published: (2024)

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
by: Li, Ming, et al.
Published: (2023)

MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
by: Chen, Yicheng, et al.
Published: (2025)

Case2Code: Scalable Synthetic Data for Code Generation
by: Shao, Yunfan, et al.
Published: (2024)

LLaVA-Video: Video Instruction Tuning With Synthetic Data
by: Zhang, Yuanhan, et al.
Published: (2024)

HardTests: Synthesizing High-Quality Test Cases for LLM Coding
by: He, Zhongmou, et al.
Published: (2025)

REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
by: Jiang, Chenxi, et al.
Published: (2025)

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data
by: Wang, Yejie, et al.
Published: (2024)

CodeContests+: High-Quality Test Case Generation for Competitive Programming
by: Wang, Zihan, et al.
Published: (2025)

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
by: Gu, Shuhao, et al.
Published: (2024)

Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
by: Majumdar, Somshubra, et al.
Published: (2024)

A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation
by: Chen, Jiajing, et al.
Published: (2024)

The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR
by: Hamed, Injy, et al.
Published: (2025)

Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention
by: Feng, Zhaoxin, et al.
Published: (2025)

Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
by: Sun, Bowen, et al.
Published: (2025)

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
by: Lin, Honglin, et al.
Published: (2025)

Multi-Agent Collaboration for Multilingual Code Instruction Tuning
by: Yang, Jian, et al.
Published: (2025)

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
by: Li, Ming, et al.
Published: (2024)

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale
by: Patel, Ajay, et al.
Published: (2026)

Less is More: High-value Data Selection for Visual Instruction Tuning
by: Liu, Zikang, et al.
Published: (2024)

Generating High Quality Synthetic Data for Dutch Medical Conversations
by: Kuan, Cecilia, et al.
Published: (2026)

Synth-Empathy: Towards High-Quality Synthetic Empathy Data
by: Liang, Hao, et al.
Published: (2024)

FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs Only
by: Zhu, He, et al.
Published: (2024)

A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
by: Zhou, Shijie, et al.
Published: (2024)

LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning
by: Ye, Yangfan, et al.
Published: (2025)

FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback
by: Qian, Kangan, et al.
Published: (2025)

Instruction Data Selection via Answer Divergence
by: Li, Bo, et al.
Published: (2026)

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
by: Li, Ming, et al.
Published: (2025)

How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison
by: Wang, Jiayin, et al.
Published: (2025)

From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning
by: Zhang, Jiajun, et al.
Published: (2026)

Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
by: Geng, Xiang, et al.
Published: (2025)

Expert-Token Resonance MoE: Bidirectional Routing with Efficiency Affinity-Driven Active Selection
by: Li, Jing, et al.
Published: (2024)