:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, He, Su, Junyou, Lun, Tianle, Tao, Yicheng, Zhang, Wenjia, Fan, Zipei, Chen, Guanhua
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2408.01323
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TAG-INSTRUCT: Controlled Instruction Complexity Enhancement through Structure-based Augmentation
by: Zhu, He, et al.
Published: (2025)

PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval
by: Zhu, He, et al.
Published: (2024)

PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
by: Zhu, He, et al.
Published: (2025)

Anchored Supervised Fine-Tuning
by: Zhu, He, et al.
Published: (2025)

InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning
by: Su, Junyou, et al.
Published: (2026)

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
by: Zhu, Yuqi, et al.
Published: (2025)

CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
by: Liu, Yilun, et al.
Published: (2023)

Self-adaptive Multimodal Retrieval-Augmented Generation
by: Zhai, Wenjia
Published: (2024)

LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning
by: Wang, Rongsheng, et al.
Published: (2024)

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data
by: Wang, Yejie, et al.
Published: (2024)

Span-level Emotion-Cause-Category Triplet Extraction with Instruction Tuning LLMs and Data Augmentation
by: Li, Xiangju, et al.
Published: (2025)

Multi-Layer Ranking with Large Language Models for News Source Recommendation
by: Zhang, Wenjia, et al.
Published: (2024)

ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions
by: He, Xingwei, et al.
Published: (2025)

For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs
by: Deng, Wenlong, et al.
Published: (2025)

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
by: Nguyen, Huu, et al.
Published: (2025)

Open (Clinical) LLMs are Sensitive to Instruction Phrasings
by: Arroyo, Alberto Mario Ceballos, et al.
Published: (2024)

PACIT: Unlocking the Power of Examples for Better In-Context Instruction Tuning
by: Xue, Tianci, et al.
Published: (2023)

CLUES: Collaborative High-Quality Data Selection for LLMs via Training Dynamics
by: Zhao, Wanru, et al.
Published: (2025)

Language-Aware Distillation for Multilingual Instruction-Following Speech LLMs with ASR-Only Supervision
by: Gopal, Shreyas, et al.
Published: (2026)

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
by: Shu, Fan, et al.
Published: (2026)

Retrieval Augmented Instruction Tuning for Open NER with Large Language Models
by: Xie, Tingyu, et al.
Published: (2024)

Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data
by: Liu, Zhongtao, et al.
Published: (2024)

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
by: Toshniwal, Shubham, et al.
Published: (2024)

GIFT: Guided Fine-Tuning and Transfer for Enhancing Instruction-Tuned Language Models
by: Ruan, Zhiwen, et al.
Published: (2026)

Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation
by: He, Linda, et al.
Published: (2025)

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
by: Li, Tianle, et al.
Published: (2024)

Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
by: Chen, Ruibo, et al.
Published: (2024)

Graphical Reasoning: LLM-based Semi-Open Relation Extraction
by: Tao, Yicheng, et al.
Published: (2024)

ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities
by: Jin, Zhenchao, et al.
Published: (2024)

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions
by: Liu, Wanlong, et al.
Published: (2024)

Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key
by: Chen, Yingda, et al.
Published: (2024)

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
by: Chiang, Wei-Lin, et al.
Published: (2024)

Enhancing Uncertainty Estimation in LLMs with Expectation of Aggregated Internal Belief
by: Xiao, Zeguan, et al.
Published: (2025)

WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages
by: Yu, Jia, et al.
Published: (2025)

Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation
by: Chen, Guanhua, et al.
Published: (2024)

Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search
by: Li, Chenglin, et al.
Published: (2024)

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
by: Xie, Chengxing, et al.
Published: (2025)

MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
by: Chen, Yicheng, et al.
Published: (2025)

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
by: Gu, Shuhao, et al.
Published: (2024)

From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs
by: He, Jie, et al.
Published: (2025)