Saved in:
| Main Authors: | Alemohammad, Sina, Chen, Li, Baraniuk, Richard G., Wang, Zhangyang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.31126 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Neon: Negative Extrapolation From Self-Training Improves Image Generation
by: Alemohammad, Sina, et al.
Published: (2025)
by: Alemohammad, Sina, et al.
Published: (2025)
Self-Improving Diffusion Models with Synthetic Data
by: Alemohammad, Sina, et al.
Published: (2024)
by: Alemohammad, Sina, et al.
Published: (2024)
Minimizing Collateral Damage in Activation Steering
by: Nguyen, Tam, et al.
Published: (2026)
by: Nguyen, Tam, et al.
Published: (2026)
Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data
by: Rashidi, Sina, et al.
Published: (2025)
by: Rashidi, Sina, et al.
Published: (2025)
Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025)
by: Guo, Yiduo, et al.
Published: (2025)
Synthetic Multimodal Question Generation
by: Wu, Ian, et al.
Published: (2024)
by: Wu, Ian, et al.
Published: (2024)
Synthetic Context Generation for Question Generation
by: Liu, Naiming, et al.
Published: (2024)
by: Liu, Naiming, et al.
Published: (2024)
SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking
by: Kulkarni, Atharva, et al.
Published: (2024)
by: Kulkarni, Atharva, et al.
Published: (2024)
CodecLM: Aligning Language Models with Tailored Synthetic Data
by: Wang, Zifeng, et al.
Published: (2024)
by: Wang, Zifeng, et al.
Published: (2024)
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
by: Luo, Kairong, et al.
Published: (2025)
by: Luo, Kairong, et al.
Published: (2025)
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
by: Havrilla, Alex, et al.
Published: (2024)
by: Havrilla, Alex, et al.
Published: (2024)
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
by: Perin, Gabriel J., et al.
Published: (2025)
by: Perin, Gabriel J., et al.
Published: (2025)
Learning from Synthetic Data Improves Multi-hop Reasoning
by: Kabra, Anmol, et al.
Published: (2026)
by: Kabra, Anmol, et al.
Published: (2026)
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
by: Ziegler, Ingo, et al.
Published: (2024)
by: Ziegler, Ingo, et al.
Published: (2024)
DualAlign: Generating Clinically Grounded Synthetic Data
by: Li, Rumeng, et al.
Published: (2025)
by: Li, Rumeng, et al.
Published: (2025)
No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data
by: Karpov, Dmitry
Published: (2026)
by: Karpov, Dmitry
Published: (2026)
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
by: Xiong, Zheyang, et al.
Published: (2024)
by: Xiong, Zheyang, et al.
Published: (2024)
Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework
by: Wang, Dong, et al.
Published: (2025)
by: Wang, Dong, et al.
Published: (2025)
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
Synthetic Data for any Differentiable Target
by: Thrush, Tristan, et al.
Published: (2026)
by: Thrush, Tristan, et al.
Published: (2026)
DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
by: Zhou, Ying, et al.
Published: (2024)
by: Zhou, Ying, et al.
Published: (2024)
A Primal-Dual Framework for Transformers and Neural Networks
by: Nguyen, Tan M., et al.
Published: (2024)
by: Nguyen, Tan M., et al.
Published: (2024)
ToolRL: Reward is All Tool Learning Needs
by: Qian, Cheng, et al.
Published: (2025)
by: Qian, Cheng, et al.
Published: (2025)
Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time
by: Hu, Michael Y., et al.
Published: (2026)
by: Hu, Michael Y., et al.
Published: (2026)
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
by: Huang, Tianjin, et al.
Published: (2025)
by: Huang, Tianjin, et al.
Published: (2025)
Reasoning-Driven Synthetic Data Generation and Evaluation
by: Davidson, Tim R., et al.
Published: (2026)
by: Davidson, Tim R., et al.
Published: (2026)
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
by: Choe, Sang Keun, et al.
Published: (2024)
by: Choe, Sang Keun, et al.
Published: (2024)
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
by: Yu, Yaoning, et al.
Published: (2025)
by: Yu, Yaoning, et al.
Published: (2025)
XL-Suite: Cross-Lingual Synthetic Training and Evaluation Data for Open-Ended Generation
by: Iyer, Vivek, et al.
Published: (2025)
by: Iyer, Vivek, et al.
Published: (2025)
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
by: Zhang, Zhenyu, et al.
Published: (2025)
by: Zhang, Zhenyu, et al.
Published: (2025)
Dynamic Context Evolution for Scalable Synthetic Data Generation
by: Lingo, Ryan, et al.
Published: (2026)
by: Lingo, Ryan, et al.
Published: (2026)
CasualSynth: Generating Structurally Sound Synthetic Data
by: Cheng, Zehua, et al.
Published: (2026)
by: Cheng, Zehua, et al.
Published: (2026)
Fill In The Gaps: Model Calibration and Generalization with Synthetic Data
by: Ba, Yang, et al.
Published: (2024)
by: Ba, Yang, et al.
Published: (2024)
Out-of-Distribution Detection using Synthetic Data Generation
by: Abbas, Momin, et al.
Published: (2025)
by: Abbas, Momin, et al.
Published: (2025)
BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation
by: Zhu, Alan, et al.
Published: (2025)
by: Zhu, Alan, et al.
Published: (2025)
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
by: Wang, Zhaoyang, et al.
Published: (2026)
by: Wang, Zhaoyang, et al.
Published: (2026)
SQBC: Active Learning using LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
by: Wagner, Stefan Sylvius, et al.
Published: (2024)
by: Wagner, Stefan Sylvius, et al.
Published: (2024)
How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse
by: Seddik, Mohamed El Amine, et al.
Published: (2024)
by: Seddik, Mohamed El Amine, et al.
Published: (2024)
Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training
by: Ilin, Aleksei, et al.
Published: (2025)
by: Ilin, Aleksei, et al.
Published: (2025)
Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
by: Aden-Ali, Ishaq, et al.
Published: (2026)
by: Aden-Ali, Ishaq, et al.
Published: (2026)
Similar Items
-
Neon: Negative Extrapolation From Self-Training Improves Image Generation
by: Alemohammad, Sina, et al.
Published: (2025) -
Self-Improving Diffusion Models with Synthetic Data
by: Alemohammad, Sina, et al.
Published: (2024) -
Minimizing Collateral Damage in Activation Steering
by: Nguyen, Tam, et al.
Published: (2026) -
Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data
by: Rashidi, Sina, et al.
Published: (2025) -
Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025)