Saved in:
| Main Authors: | Lyu, Weijie, Huang, Sheng-Jun, Xia, Xuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.02378 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Data-efficient LLM Fine-tuning for Code Generation
by: Lv, Weijie, et al.
Published: (2025)
by: Lv, Weijie, et al.
Published: (2025)
CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs
by: Lv, Weijie, et al.
Published: (2024)
by: Lv, Weijie, et al.
Published: (2024)
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
by: Wan, Guangya, et al.
Published: (2024)
by: Wan, Guangya, et al.
Published: (2024)
Importance-Aware Data Selection for Efficient LLM Instruction Tuning
by: Jiang, Tingyu, et al.
Published: (2025)
by: Jiang, Tingyu, et al.
Published: (2025)
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
by: Liu, Fengze, et al.
Published: (2025)
by: Liu, Fengze, et al.
Published: (2025)
ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training
by: Liang, Yu, et al.
Published: (2026)
by: Liang, Yu, et al.
Published: (2026)
SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models
by: Li, Zhuang, et al.
Published: (2024)
by: Li, Zhuang, et al.
Published: (2024)
Efficient Task Adaptation in Large Language Models via Selective Parameter Optimization
by: Wan, Weijie, et al.
Published: (2026)
by: Wan, Weijie, et al.
Published: (2026)
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
by: Wang, Yudong, et al.
Published: (2025)
by: Wang, Yudong, et al.
Published: (2025)
MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment
by: Qin, Weicong, et al.
Published: (2025)
by: Qin, Weicong, et al.
Published: (2025)
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models
by: Maurya, Kaushal Kumar, et al.
Published: (2024)
by: Maurya, Kaushal Kumar, et al.
Published: (2024)
Efficient Training of Language Models with Compact and Consistent Next Token Distributions
by: Sathe, Ashutosh, et al.
Published: (2024)
by: Sathe, Ashutosh, et al.
Published: (2024)
Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning
by: Kim, Junseok, et al.
Published: (2026)
by: Kim, Junseok, et al.
Published: (2026)
EvoSelect: Data-Efficient LLM Evolution for Targeted Task Adaptation
by: Li, Ting-Wei, et al.
Published: (2026)
by: Li, Ting-Wei, et al.
Published: (2026)
Efficient RLVR Training via Weighted Mutual Information Data Selection
by: Zhou, Xinyu, et al.
Published: (2026)
by: Zhou, Xinyu, et al.
Published: (2026)
Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking
by: Li, Zhuohao, et al.
Published: (2025)
by: Li, Zhuohao, et al.
Published: (2025)
LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity
by: Tekin, Selim Furkan, et al.
Published: (2024)
by: Tekin, Selim Furkan, et al.
Published: (2024)
Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning
by: Xiong, Juming, et al.
Published: (2026)
by: Xiong, Juming, et al.
Published: (2026)
An LLM Agent for Automatic Geospatial Data Analysis
by: Chen, Yuxing, et al.
Published: (2024)
by: Chen, Yuxing, et al.
Published: (2024)
CitaLaw: Enhancing LLM with Citations in Legal Domain
by: Zhang, Kepu, et al.
Published: (2024)
by: Zhang, Kepu, et al.
Published: (2024)
Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation
by: Zhang, Lechen, et al.
Published: (2026)
by: Zhang, Lechen, et al.
Published: (2026)
IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection
by: Song, Jielin, et al.
Published: (2024)
by: Song, Jielin, et al.
Published: (2024)
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
by: Wang, Aozhe, et al.
Published: (2026)
by: Wang, Aozhe, et al.
Published: (2026)
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
by: Zou, Heming, et al.
Published: (2025)
by: Zou, Heming, et al.
Published: (2025)
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
by: Zhu, Xiaoxuan, et al.
Published: (2025)
by: Zhu, Xiaoxuan, et al.
Published: (2025)
CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution
by: Yang, Shidong, et al.
Published: (2026)
by: Yang, Shidong, et al.
Published: (2026)
CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search
by: Tikhonov, Anton, et al.
Published: (2023)
by: Tikhonov, Anton, et al.
Published: (2023)
Scalable Vision Language Model Training via High Quality Data Curation
by: Dong, Hongyuan, et al.
Published: (2025)
by: Dong, Hongyuan, et al.
Published: (2025)
Scaling Multi-Hop Training Data via Graph-Constrained Path Selection
by: Chen, Pengyu, et al.
Published: (2026)
by: Chen, Pengyu, et al.
Published: (2026)
BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection
by: Hu, Zhengpei, et al.
Published: (2026)
by: Hu, Zhengpei, et al.
Published: (2026)
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness
by: Huang, Kung-Hsiang, et al.
Published: (2025)
by: Huang, Kung-Hsiang, et al.
Published: (2025)
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM
by: Xia, Chunqiu Steven, et al.
Published: (2024)
by: Xia, Chunqiu Steven, et al.
Published: (2024)
Training-Trajectory-Aware Token Selection
by: Shen, Zhanming, et al.
Published: (2026)
by: Shen, Zhanming, et al.
Published: (2026)
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
by: Wang, Shaobo, et al.
Published: (2025)
by: Wang, Shaobo, et al.
Published: (2025)
InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning
by: Su, Junyou, et al.
Published: (2026)
by: Su, Junyou, et al.
Published: (2026)
AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use
by: Lyu, Yuanjie, et al.
Published: (2026)
by: Lyu, Yuanjie, et al.
Published: (2026)
LLM Circuit Analyses Are Consistent Across Training and Scale
by: Tigges, Curt, et al.
Published: (2024)
by: Tigges, Curt, et al.
Published: (2024)
Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking
by: Zhang, Xiaokang, et al.
Published: (2024)
by: Zhang, Xiaokang, et al.
Published: (2024)
Data Compressibility Quantifies LLM Memorization
by: Huang, Yizhan, et al.
Published: (2025)
by: Huang, Yizhan, et al.
Published: (2025)
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
by: Yu, Zichun, et al.
Published: (2024)
by: Yu, Zichun, et al.
Published: (2024)
Similar Items
-
Data-efficient LLM Fine-tuning for Code Generation
by: Lv, Weijie, et al.
Published: (2025) -
CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs
by: Lv, Weijie, et al.
Published: (2024) -
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
by: Wan, Guangya, et al.
Published: (2024) -
Importance-Aware Data Selection for Efficient LLM Instruction Tuning
by: Jiang, Tingyu, et al.
Published: (2025) -
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
by: Liu, Fengze, et al.
Published: (2025)