Saved in:
| Main Authors: | Lei, Jiayin, Ma, Ming, Duan, Yunxi, Li, Chenxi, Yang, Tianming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.12165 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
QAQ: Quality Adaptive Quantization for LLM KV Cache
by: Dong, Shichen, et al.
Published: (2024)
by: Dong, Shichen, et al.
Published: (2024)
SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning
by: Wang, Peidong, et al.
Published: (2026)
by: Wang, Peidong, et al.
Published: (2026)
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
by: Chen, Ruibo, et al.
Published: (2024)
by: Chen, Ruibo, et al.
Published: (2024)
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom
by: Li, Yisen, et al.
Published: (2025)
by: Li, Yisen, et al.
Published: (2025)
Label Words as Local Task Vectors in In-Context Learning
by: Zheng, Bowen, et al.
Published: (2024)
by: Zheng, Bowen, et al.
Published: (2024)
SimLens for Early Exit in Large Language Models: Eliciting Accurate Latent Predictions with One More Token
by: Ma, Ming, et al.
Published: (2025)
by: Ma, Ming, et al.
Published: (2025)
Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification
by: Xing, Wenjing, et al.
Published: (2025)
by: Xing, Wenjing, et al.
Published: (2025)
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
by: Liu, Yilun, et al.
Published: (2023)
by: Liu, Yilun, et al.
Published: (2023)
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
by: Ge, Yuan, et al.
Published: (2024)
by: Ge, Yuan, et al.
Published: (2024)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
by: Li, Ming, et al.
Published: (2023)
by: Li, Ming, et al.
Published: (2023)
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
by: Chen, Yicheng, et al.
Published: (2025)
by: Chen, Yicheng, et al.
Published: (2025)
Case2Code: Scalable Synthetic Data for Code Generation
by: Shao, Yunfan, et al.
Published: (2024)
by: Shao, Yunfan, et al.
Published: (2024)
LLaVA-Video: Video Instruction Tuning With Synthetic Data
by: Zhang, Yuanhan, et al.
Published: (2024)
by: Zhang, Yuanhan, et al.
Published: (2024)
HardTests: Synthesizing High-Quality Test Cases for LLM Coding
by: He, Zhongmou, et al.
Published: (2025)
by: He, Zhongmou, et al.
Published: (2025)
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
by: Jiang, Chenxi, et al.
Published: (2025)
by: Jiang, Chenxi, et al.
Published: (2025)
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data
by: Wang, Yejie, et al.
Published: (2024)
by: Wang, Yejie, et al.
Published: (2024)
CodeContests+: High-Quality Test Case Generation for Competitive Programming
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
by: Gu, Shuhao, et al.
Published: (2024)
by: Gu, Shuhao, et al.
Published: (2024)
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
by: Majumdar, Somshubra, et al.
Published: (2024)
by: Majumdar, Somshubra, et al.
Published: (2024)
A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation
by: Chen, Jiajing, et al.
Published: (2024)
by: Chen, Jiajing, et al.
Published: (2024)
The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR
by: Hamed, Injy, et al.
Published: (2025)
by: Hamed, Injy, et al.
Published: (2025)
Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention
by: Feng, Zhaoxin, et al.
Published: (2025)
by: Feng, Zhaoxin, et al.
Published: (2025)
Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
by: Sun, Bowen, et al.
Published: (2025)
by: Sun, Bowen, et al.
Published: (2025)
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
by: Lin, Honglin, et al.
Published: (2025)
by: Lin, Honglin, et al.
Published: (2025)
Multi-Agent Collaboration for Multilingual Code Instruction Tuning
by: Yang, Jian, et al.
Published: (2025)
by: Yang, Jian, et al.
Published: (2025)
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale
by: Patel, Ajay, et al.
Published: (2026)
by: Patel, Ajay, et al.
Published: (2026)
Less is More: High-value Data Selection for Visual Instruction Tuning
by: Liu, Zikang, et al.
Published: (2024)
by: Liu, Zikang, et al.
Published: (2024)
Generating High Quality Synthetic Data for Dutch Medical Conversations
by: Kuan, Cecilia, et al.
Published: (2026)
by: Kuan, Cecilia, et al.
Published: (2026)
Synth-Empathy: Towards High-Quality Synthetic Empathy Data
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs Only
by: Zhu, He, et al.
Published: (2024)
by: Zhu, He, et al.
Published: (2024)
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
by: Zhou, Shijie, et al.
Published: (2024)
by: Zhou, Shijie, et al.
Published: (2024)
LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning
by: Ye, Yangfan, et al.
Published: (2025)
by: Ye, Yangfan, et al.
Published: (2025)
FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback
by: Qian, Kangan, et al.
Published: (2025)
by: Qian, Kangan, et al.
Published: (2025)
Instruction Data Selection via Answer Divergence
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison
by: Wang, Jiayin, et al.
Published: (2025)
by: Wang, Jiayin, et al.
Published: (2025)
From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning
by: Zhang, Jiajun, et al.
Published: (2026)
by: Zhang, Jiajun, et al.
Published: (2026)
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
by: Geng, Xiang, et al.
Published: (2025)
by: Geng, Xiang, et al.
Published: (2025)
Expert-Token Resonance MoE: Bidirectional Routing with Efficiency Affinity-Driven Active Selection
by: Li, Jing, et al.
Published: (2024)
by: Li, Jing, et al.
Published: (2024)
Similar Items
-
QAQ: Quality Adaptive Quantization for LLM KV Cache
by: Dong, Shichen, et al.
Published: (2024) -
SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning
by: Wang, Peidong, et al.
Published: (2026) -
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
by: Chen, Ruibo, et al.
Published: (2024) -
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom
by: Li, Yisen, et al.
Published: (2025) -
Label Words as Local Task Vectors in In-Context Learning
by: Zheng, Bowen, et al.
Published: (2024)