Saved in:
| Main Authors: | Wu, Jian, Yu, Hang, Liu, Bingchang, Yang, Wenjie, Di, Peng, Li, Jianguo, Zhang, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.06524 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models
by: Gong, Zi, et al.
Published: (2024)
by: Gong, Zi, et al.
Published: (2024)
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
by: Zhang, Ziyin, et al.
Published: (2023)
by: Zhang, Ziyin, et al.
Published: (2023)
Unified Data Selection for LLM Reasoning
by: Li, Xiaoyuan, et al.
Published: (2026)
by: Li, Xiaoyuan, et al.
Published: (2026)
InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning
by: Su, Junyou, et al.
Published: (2026)
by: Su, Junyou, et al.
Published: (2026)
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding
by: Zhang, Ziyin, et al.
Published: (2024)
by: Zhang, Ziyin, et al.
Published: (2024)
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
by: Liao, Zihan, et al.
Published: (2024)
by: Liao, Zihan, et al.
Published: (2024)
Pruning as a Domain-specific LLM Extractor
by: Zhang, Nan, et al.
Published: (2024)
by: Zhang, Nan, et al.
Published: (2024)
LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs
by: Zhang, Ying, et al.
Published: (2026)
by: Zhang, Ying, et al.
Published: (2026)
F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data
by: Zhang, Ziyin, et al.
Published: (2025)
by: Zhang, Ziyin, et al.
Published: (2025)
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
by: Liao, Zihan, et al.
Published: (2024)
by: Liao, Zihan, et al.
Published: (2024)
Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning
by: Zhong, Qihuang, et al.
Published: (2025)
by: Zhong, Qihuang, et al.
Published: (2025)
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World
by: Zhang, Ziyin, et al.
Published: (2026)
by: Zhang, Ziyin, et al.
Published: (2026)
A Data Synthesis Method Driven by Large Language Models for Proactive Mining of Implicit User Intentions in Tourism
by: Wang, Jinqiang, et al.
Published: (2025)
by: Wang, Jinqiang, et al.
Published: (2025)
ImF: Implicit Fingerprint for Large Language Models
by: Wu, Jiaxuan, et al.
Published: (2025)
by: Wu, Jiaxuan, et al.
Published: (2025)
DavIR: Data Selection via Implicit Reward for Large Language Models
by: Zhou, Haotian, et al.
Published: (2023)
by: Zhou, Haotian, et al.
Published: (2023)
ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
EIoU-EMC: A Novel Loss for Domain-specific Nested Entity Recognition
by: Zhang, Jian, et al.
Published: (2025)
by: Zhang, Jian, et al.
Published: (2025)
COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
by: Liu, Youwei, et al.
Published: (2026)
by: Liu, Youwei, et al.
Published: (2026)
Greedy Information Projection for LLM Data Selection
by: Dong, Victor Ye, et al.
Published: (2026)
by: Dong, Victor Ye, et al.
Published: (2026)
DALLMi: Domain Adaption for LLM-based Multi-label Classifier
by: Beţianu, Miruna, et al.
Published: (2024)
by: Beţianu, Miruna, et al.
Published: (2024)
Incubating Text Classifiers Following User Instruction with Nothing but LLM
by: Peng, Letian, et al.
Published: (2024)
by: Peng, Letian, et al.
Published: (2024)
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts
by: Zhang, Yifan, et al.
Published: (2024)
by: Zhang, Yifan, et al.
Published: (2024)
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
by: He, Zhihao, et al.
Published: (2024)
by: He, Zhihao, et al.
Published: (2024)
Explicit and Implicit Data Augmentation for Social Event Detection
by: Ma, Congbo, et al.
Published: (2025)
by: Ma, Congbo, et al.
Published: (2025)
Selection of LLM Fine-Tuning Data based on Orthogonal Rules
by: Li, Xiaomin, et al.
Published: (2024)
by: Li, Xiaomin, et al.
Published: (2024)
AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data Augmentation
by: Meng, Rui, et al.
Published: (2022)
by: Meng, Rui, et al.
Published: (2022)
Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA
by: Chen, Xi, et al.
Published: (2024)
by: Chen, Xi, et al.
Published: (2024)
C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling
by: Qin, Jin, et al.
Published: (2025)
by: Qin, Jin, et al.
Published: (2025)
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
by: Liu, Fengze, et al.
Published: (2025)
by: Liu, Fengze, et al.
Published: (2025)
NITP: Next Implicit Token Prediction for LLM Pre-training
by: Zhang, Xiangdong, et al.
Published: (2026)
by: Zhang, Xiangdong, et al.
Published: (2026)
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
by: Codefuse, et al.
Published: (2025)
by: Codefuse, et al.
Published: (2025)
LLM with Relation Classifier for Document-Level Relation Extraction
by: Li, Xingzuo, et al.
Published: (2024)
by: Li, Xingzuo, et al.
Published: (2024)
Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
by: Pang, Jinlong, et al.
Published: (2025)
by: Pang, Jinlong, et al.
Published: (2025)
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
by: Wu, Yang, et al.
Published: (2024)
by: Wu, Yang, et al.
Published: (2024)
DiSRouter: Distributed Self-Routing for LLM Selections
by: Zheng, Hang, et al.
Published: (2025)
by: Zheng, Hang, et al.
Published: (2025)
Domain-specific Guided Summarization for Mental Health Posts
by: Qian, Lu, et al.
Published: (2024)
by: Qian, Lu, et al.
Published: (2024)
SALP-CG: Standard-Aligned LLM Pipeline for Classifying and Grading Large Volumes of Online Conversational Health Data
by: Yan, Yiwei, et al.
Published: (2025)
by: Yan, Yiwei, et al.
Published: (2025)
SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy
by: Zhang, Tingkai, et al.
Published: (2024)
by: Zhang, Tingkai, et al.
Published: (2024)
From Myopic Selection to Long-Horizon Awareness: Sequential LLM Routing for Multi-Turn Dialogue
by: Zhang, Jiarui, et al.
Published: (2026)
by: Zhang, Jiarui, et al.
Published: (2026)
Don't Say No: Jailbreaking LLM by Suppressing Refusal
by: Zhou, Yukai, et al.
Published: (2024)
by: Zhou, Yukai, et al.
Published: (2024)
Similar Items
-
CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models
by: Gong, Zi, et al.
Published: (2024) -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
by: Zhang, Ziyin, et al.
Published: (2023) -
Unified Data Selection for LLM Reasoning
by: Li, Xiaoyuan, et al.
Published: (2026) -
InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning
by: Su, Junyou, et al.
Published: (2026) -
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding
by: Zhang, Ziyin, et al.
Published: (2024)