Saved in:
| Main Author: | Sun, Mengyi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.02578 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval
by: Chitale, Pranjal A., et al.
Published: (2025)
by: Chitale, Pranjal A., et al.
Published: (2025)
AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data Augmentation
by: Meng, Rui, et al.
Published: (2022)
by: Meng, Rui, et al.
Published: (2022)
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA
by: Tang, Xing, et al.
Published: (2026)
by: Tang, Xing, et al.
Published: (2026)
ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget
by: Thakur, Nandan, et al.
Published: (2026)
by: Thakur, Nandan, et al.
Published: (2026)
C3PA: An Open Dataset of Expert-Annotated and Regulation-Aware Privacy Policies to Enable Scalable Regulatory Compliance Audits
by: Musa, Maaz Bin, et al.
Published: (2024)
by: Musa, Maaz Bin, et al.
Published: (2024)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning
by: Min, Yingqian, et al.
Published: (2024)
by: Min, Yingqian, et al.
Published: (2024)
Text Data Integration
by: Rahman, Md Ataur, et al.
Published: (2026)
by: Rahman, Md Ataur, et al.
Published: (2026)
Data Augmentation for Conversational AI
by: Soudani, Heydar, et al.
Published: (2023)
by: Soudani, Heydar, et al.
Published: (2023)
ConvMix: A Mixed-Criteria Data Augmentation Framework for Conversational Dense Retrieval
by: Mo, Fengran, et al.
Published: (2025)
by: Mo, Fengran, et al.
Published: (2025)
SkillBrew: Multi-Objective Curation of Skill Banks for LLM Agents
by: Hu, Wentao, et al.
Published: (2026)
by: Hu, Wentao, et al.
Published: (2026)
Evolving Text Data Stream Mining
by: Kumar, Jay
Published: (2024)
by: Kumar, Jay
Published: (2024)
Research on the Online Update Method for Retrieval-Augmented Generation (RAG) Model with Incremental Learning
by: Fan, Yuxin, et al.
Published: (2025)
by: Fan, Yuxin, et al.
Published: (2025)
Serendipity with Generative AI: Repurposing knowledge components during polycrisis with a Viable Systems Model approach
by: Fletcher, Gordon, et al.
Published: (2025)
by: Fletcher, Gordon, et al.
Published: (2025)
Query-oriented Data Augmentation for Session Search
by: Chen, Haonan, et al.
Published: (2024)
by: Chen, Haonan, et al.
Published: (2024)
Ordered Semantically Diverse Sampling for Textual Data
by: Tiwari, Ashish, et al.
Published: (2025)
by: Tiwari, Ashish, et al.
Published: (2025)
FlyAOC: Evaluating Agentic Ontology Curation of Drosophila Scientific Knowledge Bases
by: Zhang, Xingjian, et al.
Published: (2026)
by: Zhang, Xingjian, et al.
Published: (2026)
BioChemInsight: An Online Platform for Automated Extraction of Chemical Structures and Activity Data from Patents
by: Wang, Zhe, et al.
Published: (2025)
by: Wang, Zhe, et al.
Published: (2025)
Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance
by: Esfandiarpoor, Reza, et al.
Published: (2025)
by: Esfandiarpoor, Reza, et al.
Published: (2025)
Large Language Models Require Curated Context for Reliable Political Fact-Checking -- Even with Reasoning and Web Search
by: DeVerna, Matthew R., et al.
Published: (2025)
by: DeVerna, Matthew R., et al.
Published: (2025)
SRAG: RAG with Structured Data Improves Vector Retrieval
by: Shah, Shalin, et al.
Published: (2026)
by: Shah, Shalin, et al.
Published: (2026)
ConvSDG: Session Data Generation for Conversational Search
by: Mo, Fengran, et al.
Published: (2024)
by: Mo, Fengran, et al.
Published: (2024)
On Synthetic Data Strategies for Domain-Specific Generative Retrieval
by: Wen, Haoyang, et al.
Published: (2025)
by: Wen, Haoyang, et al.
Published: (2025)
Self-Compositional Data Augmentation for Scientific Keyphrase Generation
by: Houbre, Mael, et al.
Published: (2024)
by: Houbre, Mael, et al.
Published: (2024)
LiveNewsBench: Evaluating LLM Web Search Capabilities with Freshly Curated News
by: Zhang, Yunfan, et al.
Published: (2026)
by: Zhang, Yunfan, et al.
Published: (2026)
Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models
by: Sundaram, Sowmya S., et al.
Published: (2024)
by: Sundaram, Sowmya S., et al.
Published: (2024)
Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents
by: Choe, Jaeyoung, et al.
Published: (2025)
by: Choe, Jaeyoung, et al.
Published: (2025)
Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation
by: Guttal, Pooja, et al.
Published: (2026)
by: Guttal, Pooja, et al.
Published: (2026)
Knowing When to Ask -- Bridging Large Language Models and Data
by: Radhakrishnan, Prashanth, et al.
Published: (2024)
by: Radhakrishnan, Prashanth, et al.
Published: (2024)
RAG-based Question Answering over Heterogeneous Data and Text
by: Christmann, Philipp, et al.
Published: (2024)
by: Christmann, Philipp, et al.
Published: (2024)
Improving Conversational Recommendation Systems via Counterfactual Data Simulation
by: Wang, Xiaolei, et al.
Published: (2023)
by: Wang, Xiaolei, et al.
Published: (2023)
Data Augmentation Techniques for Process Extraction from Scientific Publications
by: Susanti, Yuni
Published: (2024)
by: Susanti, Yuni
Published: (2024)
Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines
by: Ge, Yuhang, et al.
Published: (2025)
by: Ge, Yuhang, et al.
Published: (2025)
OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation
by: Fang, Haoyang, et al.
Published: (2026)
by: Fang, Haoyang, et al.
Published: (2026)
Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana
by: Filice, Simone, et al.
Published: (2025)
by: Filice, Simone, et al.
Published: (2025)
Scaling Knowledge Graph Construction through Synthetic Data Generation and Distillation
by: Choubey, Prafulla Kumar, et al.
Published: (2024)
by: Choubey, Prafulla Kumar, et al.
Published: (2024)
Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation
by: Chen, Haonan, et al.
Published: (2024)
by: Chen, Haonan, et al.
Published: (2024)
An Integrated Data Processing Framework for Pretraining Foundation Models
by: Sun, Yiding, et al.
Published: (2024)
by: Sun, Yiding, et al.
Published: (2024)
Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft
by: Liu, Peiyang, et al.
Published: (2025)
by: Liu, Peiyang, et al.
Published: (2025)
Don't Retrieve, Generate: Prompting LLMs for Synthetic Training Data in Dense Retrieval
by: Sinha, Aarush
Published: (2025)
by: Sinha, Aarush
Published: (2025)
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
by: Suresh, Tarun, et al.
Published: (2024)
by: Suresh, Tarun, et al.
Published: (2024)
Similar Items
-
Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval
by: Chitale, Pranjal A., et al.
Published: (2025) -
AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data Augmentation
by: Meng, Rui, et al.
Published: (2022) -
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA
by: Tang, Xing, et al.
Published: (2026) -
ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget
by: Thakur, Nandan, et al.
Published: (2026) -
C3PA: An Open Dataset of Expert-Annotated and Regulation-Aware Privacy Policies to Enable Scalable Regulatory Compliance Audits
by: Musa, Maaz Bin, et al.
Published: (2024)