Saved in:
| Main Authors: | Nadas, Mihai, Diosan, Laura, Tomescu, Andreea |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.14023 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
by: Nadas, Mihai, et al.
Published: (2025)
by: Nadas, Mihai, et al.
Published: (2025)
TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction
by: Nadas, Mihai Dan, et al.
Published: (2026)
by: Nadas, Mihai Dan, et al.
Published: (2026)
Building Large-Scale English-Romanian Literary Translation Resources with Open Models
by: Nadas, Mihai, et al.
Published: (2025)
by: Nadas, Mihai, et al.
Published: (2025)
Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study
by: Nadas, Mihai, et al.
Published: (2025)
by: Nadas, Mihai, et al.
Published: (2025)
Value-Aware Numerical Representations for Transformer Language Models
by: Dutulescu, Andreea, et al.
Published: (2026)
by: Dutulescu, Andreea, et al.
Published: (2026)
Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data
by: Cook, John, et al.
Published: (2026)
by: Cook, John, et al.
Published: (2026)
Synthetic Text Generation for Training Large Language Models via Gradient Matching
by: Nguyen, Dang, et al.
Published: (2025)
by: Nguyen, Dang, et al.
Published: (2025)
German Text Simplification: Finetuning Large Language Models with Semi-Synthetic Data
by: Klöser, Lars, et al.
Published: (2024)
by: Klöser, Lars, et al.
Published: (2024)
Structsum Generation for Faster Text Comprehension
by: Jain, Parag, et al.
Published: (2024)
by: Jain, Parag, et al.
Published: (2024)
DataGen: Unified Synthetic Dataset Generation via Large Language Models
by: Huang, Yue, et al.
Published: (2024)
by: Huang, Yue, et al.
Published: (2024)
Case2Code: Scalable Synthetic Data for Code Generation
by: Shao, Yunfan, et al.
Published: (2024)
by: Shao, Yunfan, et al.
Published: (2024)
Data Generation Using Large Language Models for Text Classification: An Empirical Case Study
by: Li, Yinheng, et al.
Published: (2024)
by: Li, Yinheng, et al.
Published: (2024)
Evaluating Language Models as Synthetic Data Generators
by: Kim, Seungone, et al.
Published: (2024)
by: Kim, Seungone, et al.
Published: (2024)
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
by: Majumdar, Somshubra, et al.
Published: (2024)
by: Majumdar, Somshubra, et al.
Published: (2024)
SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task
by: Zhong, Ziije, et al.
Published: (2024)
by: Zhong, Ziije, et al.
Published: (2024)
Synthetic Data Generation for Phrase Break Prediction with Large Language Model
by: Lee, Hoyeon, et al.
Published: (2025)
by: Lee, Hoyeon, et al.
Published: (2025)
Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Large Language Models
by: Ghanadian, Hamideh, et al.
Published: (2024)
by: Ghanadian, Hamideh, et al.
Published: (2024)
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
by: Yang, Yue, et al.
Published: (2025)
by: Yang, Yue, et al.
Published: (2025)
Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model
by: Lin, Yu-Chen, et al.
Published: (2023)
by: Lin, Yu-Chen, et al.
Published: (2023)
Persona-Based Synthetic Data Generation Using Multi-Stage Conditioning with Large Language Models for Emotion Recognition
by: Inoshita, Keito, et al.
Published: (2025)
by: Inoshita, Keito, et al.
Published: (2025)
Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data
by: Li, Haolong, et al.
Published: (2024)
by: Li, Haolong, et al.
Published: (2024)
GCOF: Self-iterative Text Generation for Copywriting Using Large Language Model
by: Zhou, Jianghui, et al.
Published: (2024)
by: Zhou, Jianghui, et al.
Published: (2024)
Deep Active Learning for Data Mining from Conflict Text Corpora
by: Croicu, Mihai
Published: (2024)
by: Croicu, Mihai
Published: (2024)
Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data
by: Li, Haoran, et al.
Published: (2024)
by: Li, Haoran, et al.
Published: (2024)
The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models
by: Katzy, Jonathan, et al.
Published: (2025)
by: Katzy, Jonathan, et al.
Published: (2025)
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
by: Xu, Ran, et al.
Published: (2023)
by: Xu, Ran, et al.
Published: (2023)
An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation
by: Mahapatra, Joy, et al.
Published: (2024)
by: Mahapatra, Joy, et al.
Published: (2024)
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
by: Huang, Yiming, et al.
Published: (2024)
by: Huang, Yiming, et al.
Published: (2024)
Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?
by: Winata, Genta Indra, et al.
Published: (2026)
by: Winata, Genta Indra, et al.
Published: (2026)
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
by: Golchin, Shahriar, et al.
Published: (2023)
by: Golchin, Shahriar, et al.
Published: (2023)
Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation
by: Rousseau, Cécile, et al.
Published: (2025)
by: Rousseau, Cécile, et al.
Published: (2025)
Private Synthetic Text Generation with Diffusion Models
by: Ochs, Sebastian, et al.
Published: (2024)
by: Ochs, Sebastian, et al.
Published: (2024)
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
by: Golchin, Shahriar, et al.
Published: (2023)
by: Golchin, Shahriar, et al.
Published: (2023)
Generative Text Steganography with Large Language Model
by: Wu, Jiaxuan, et al.
Published: (2024)
by: Wu, Jiaxuan, et al.
Published: (2024)
Advancing Text Classification with Large Language Models and Neural Attention Mechanisms
by: Lyu, Ning, et al.
Published: (2025)
by: Lyu, Ning, et al.
Published: (2025)
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
by: Ashiga, Mari, et al.
Published: (2025)
by: Ashiga, Mari, et al.
Published: (2025)
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
by: Wu, Xiaojun, et al.
Published: (2024)
by: Wu, Xiaojun, et al.
Published: (2024)
Aligning Large Language Models via Fully Self-Synthetic Data
by: Yin, Shangjian, et al.
Published: (2025)
by: Yin, Shangjian, et al.
Published: (2025)
Unlocking the Potential of Large Language Models in the Nuclear Industry with Synthetic Data
by: Anwar, Muhammad, et al.
Published: (2025)
by: Anwar, Muhammad, et al.
Published: (2025)
On the Diversity of Synthetic Data and its Impact on Training Large Language Models
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
Similar Items
-
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
by: Nadas, Mihai, et al.
Published: (2025) -
TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction
by: Nadas, Mihai Dan, et al.
Published: (2026) -
Building Large-Scale English-Romanian Literary Translation Resources with Open Models
by: Nadas, Mihai, et al.
Published: (2025) -
Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study
by: Nadas, Mihai, et al.
Published: (2025) -
Value-Aware Numerical Representations for Transformer Language Models
by: Dutulescu, Andreea, et al.
Published: (2026)