:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Alemohammad, Sina, Chen, Li, Baraniuk, Richard G., Wang, Zhangyang
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2605.31126
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Neon: Negative Extrapolation From Self-Training Improves Image Generation
by: Alemohammad, Sina, et al.
Published: (2025)

Self-Improving Diffusion Models with Synthetic Data
by: Alemohammad, Sina, et al.
Published: (2024)

Minimizing Collateral Damage in Activation Steering
by: Nguyen, Tam, et al.
Published: (2026)

Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data
by: Rashidi, Sina, et al.
Published: (2025)

Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025)

Synthetic Multimodal Question Generation
by: Wu, Ian, et al.
Published: (2024)

Synthetic Context Generation for Question Generation
by: Liu, Naiming, et al.
Published: (2024)

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking
by: Kulkarni, Atharva, et al.
Published: (2024)

CodecLM: Aligning Language Models with Tailored Synthetic Data
by: Wang, Zifeng, et al.
Published: (2024)

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
by: Luo, Kairong, et al.
Published: (2025)

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
by: Havrilla, Alex, et al.
Published: (2024)

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
by: Perin, Gabriel J., et al.
Published: (2025)

Learning from Synthetic Data Improves Multi-hop Reasoning
by: Kabra, Anmol, et al.
Published: (2026)

CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
by: Ziegler, Ingo, et al.
Published: (2024)

DualAlign: Generating Clinically Grounded Synthetic Data
by: Li, Rumeng, et al.
Published: (2025)

No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data
by: Karpov, Dmitry
Published: (2026)

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
by: Xiong, Zheyang, et al.
Published: (2024)

Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework
by: Wang, Dong, et al.
Published: (2025)

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
by: Pandit, Shrey, et al.
Published: (2025)

Synthetic Data for any Differentiable Target
by: Thrush, Tristan, et al.
Published: (2026)

DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
by: Zhou, Ying, et al.
Published: (2024)

A Primal-Dual Framework for Transformers and Neural Networks
by: Nguyen, Tan M., et al.
Published: (2024)

ToolRL: Reward is All Tool Learning Needs
by: Qian, Cheng, et al.
Published: (2025)

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time
by: Hu, Michael Y., et al.
Published: (2026)

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
by: Huang, Tianjin, et al.
Published: (2025)

Reasoning-Driven Synthetic Data Generation and Evaluation
by: Davidson, Tim R., et al.
Published: (2026)

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
by: Choe, Sang Keun, et al.
Published: (2024)

SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
by: Yu, Yaoning, et al.
Published: (2025)

XL-Suite: Cross-Lingual Synthetic Training and Evaluation Data for Open-Ended Generation
by: Iyer, Vivek, et al.
Published: (2025)

Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
by: Zhang, Zhenyu, et al.
Published: (2025)

Dynamic Context Evolution for Scalable Synthetic Data Generation
by: Lingo, Ryan, et al.
Published: (2026)

CasualSynth: Generating Structurally Sound Synthetic Data
by: Cheng, Zehua, et al.
Published: (2026)

Fill In The Gaps: Model Calibration and Generalization with Synthetic Data
by: Ba, Yang, et al.
Published: (2024)

Out-of-Distribution Detection using Synthetic Data Generation
by: Abbas, Momin, et al.
Published: (2025)

BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation
by: Zhu, Alan, et al.
Published: (2025)

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
by: Wang, Zhaoyang, et al.
Published: (2026)

SQBC: Active Learning using LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
by: Wagner, Stefan Sylvius, et al.
Published: (2024)

How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse
by: Seddik, Mohamed El Amine, et al.
Published: (2024)

Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training
by: Ilin, Aleksei, et al.
Published: (2025)

Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
by: Aden-Ali, Ishaq, et al.
Published: (2026)