Saved in:
| Main Authors: | Zhu, He, Su, Junyou, Lun, Tianle, Tao, Yicheng, Zhang, Wenjia, Fan, Zipei, Chen, Guanhua |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.01323 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TAG-INSTRUCT: Controlled Instruction Complexity Enhancement through Structure-based Augmentation
by: Zhu, He, et al.
Published: (2025)
by: Zhu, He, et al.
Published: (2025)
PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval
by: Zhu, He, et al.
Published: (2024)
by: Zhu, He, et al.
Published: (2024)
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
by: Zhu, He, et al.
Published: (2025)
by: Zhu, He, et al.
Published: (2025)
Anchored Supervised Fine-Tuning
by: Zhu, He, et al.
Published: (2025)
by: Zhu, He, et al.
Published: (2025)
InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning
by: Su, Junyou, et al.
Published: (2026)
by: Su, Junyou, et al.
Published: (2026)
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
by: Zhu, Yuqi, et al.
Published: (2025)
by: Zhu, Yuqi, et al.
Published: (2025)
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
by: Liu, Yilun, et al.
Published: (2023)
by: Liu, Yilun, et al.
Published: (2023)
Self-adaptive Multimodal Retrieval-Augmented Generation
by: Zhai, Wenjia
Published: (2024)
by: Zhai, Wenjia
Published: (2024)
LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning
by: Wang, Rongsheng, et al.
Published: (2024)
by: Wang, Rongsheng, et al.
Published: (2024)
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data
by: Wang, Yejie, et al.
Published: (2024)
by: Wang, Yejie, et al.
Published: (2024)
Span-level Emotion-Cause-Category Triplet Extraction with Instruction Tuning LLMs and Data Augmentation
by: Li, Xiangju, et al.
Published: (2025)
by: Li, Xiangju, et al.
Published: (2025)
Multi-Layer Ranking with Large Language Models for News Source Recommendation
by: Zhang, Wenjia, et al.
Published: (2024)
by: Zhang, Wenjia, et al.
Published: (2024)
ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions
by: He, Xingwei, et al.
Published: (2025)
by: He, Xingwei, et al.
Published: (2025)
For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
by: Nguyen, Huu, et al.
Published: (2025)
by: Nguyen, Huu, et al.
Published: (2025)
Open (Clinical) LLMs are Sensitive to Instruction Phrasings
by: Arroyo, Alberto Mario Ceballos, et al.
Published: (2024)
by: Arroyo, Alberto Mario Ceballos, et al.
Published: (2024)
PACIT: Unlocking the Power of Examples for Better In-Context Instruction Tuning
by: Xue, Tianci, et al.
Published: (2023)
by: Xue, Tianci, et al.
Published: (2023)
CLUES: Collaborative High-Quality Data Selection for LLMs via Training Dynamics
by: Zhao, Wanru, et al.
Published: (2025)
by: Zhao, Wanru, et al.
Published: (2025)
Language-Aware Distillation for Multilingual Instruction-Following Speech LLMs with ASR-Only Supervision
by: Gopal, Shreyas, et al.
Published: (2026)
by: Gopal, Shreyas, et al.
Published: (2026)
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
by: Shu, Fan, et al.
Published: (2026)
by: Shu, Fan, et al.
Published: (2026)
Retrieval Augmented Instruction Tuning for Open NER with Large Language Models
by: Xie, Tingyu, et al.
Published: (2024)
by: Xie, Tingyu, et al.
Published: (2024)
Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data
by: Liu, Zhongtao, et al.
Published: (2024)
by: Liu, Zhongtao, et al.
Published: (2024)
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
by: Toshniwal, Shubham, et al.
Published: (2024)
by: Toshniwal, Shubham, et al.
Published: (2024)
GIFT: Guided Fine-Tuning and Transfer for Enhancing Instruction-Tuned Language Models
by: Ruan, Zhiwen, et al.
Published: (2026)
by: Ruan, Zhiwen, et al.
Published: (2026)
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation
by: He, Linda, et al.
Published: (2025)
by: He, Linda, et al.
Published: (2025)
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
by: Li, Tianle, et al.
Published: (2024)
by: Li, Tianle, et al.
Published: (2024)
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
by: Chen, Ruibo, et al.
Published: (2024)
by: Chen, Ruibo, et al.
Published: (2024)
Graphical Reasoning: LLM-based Semi-Open Relation Extraction
by: Tao, Yicheng, et al.
Published: (2024)
by: Tao, Yicheng, et al.
Published: (2024)
ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities
by: Jin, Zhenchao, et al.
Published: (2024)
by: Jin, Zhenchao, et al.
Published: (2024)
RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions
by: Liu, Wanlong, et al.
Published: (2024)
by: Liu, Wanlong, et al.
Published: (2024)
Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key
by: Chen, Yingda, et al.
Published: (2024)
by: Chen, Yingda, et al.
Published: (2024)
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
by: Chiang, Wei-Lin, et al.
Published: (2024)
by: Chiang, Wei-Lin, et al.
Published: (2024)
Enhancing Uncertainty Estimation in LLMs with Expectation of Aggregated Internal Belief
by: Xiao, Zeguan, et al.
Published: (2025)
by: Xiao, Zeguan, et al.
Published: (2025)
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages
by: Yu, Jia, et al.
Published: (2025)
by: Yu, Jia, et al.
Published: (2025)
Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation
by: Chen, Guanhua, et al.
Published: (2024)
by: Chen, Guanhua, et al.
Published: (2024)
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search
by: Li, Chenglin, et al.
Published: (2024)
by: Li, Chenglin, et al.
Published: (2024)
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
by: Xie, Chengxing, et al.
Published: (2025)
by: Xie, Chengxing, et al.
Published: (2025)
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
by: Chen, Yicheng, et al.
Published: (2025)
by: Chen, Yicheng, et al.
Published: (2025)
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
by: Gu, Shuhao, et al.
Published: (2024)
by: Gu, Shuhao, et al.
Published: (2024)
From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs
by: He, Jie, et al.
Published: (2025)
by: He, Jie, et al.
Published: (2025)
Similar Items
-
TAG-INSTRUCT: Controlled Instruction Complexity Enhancement through Structure-based Augmentation
by: Zhu, He, et al.
Published: (2025) -
PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval
by: Zhu, He, et al.
Published: (2024) -
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
by: Zhu, He, et al.
Published: (2025) -
Anchored Supervised Fine-Tuning
by: Zhu, He, et al.
Published: (2025) -
InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning
by: Su, Junyou, et al.
Published: (2026)