:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ren, Libo, Belkadi, Samuel, Han, Lifeng, Del-Pinto, Warren, Nenadic, Goran
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.09501
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploration of Masked and Causal Language Modelling for Text Generation
by: Micheletti, Nicolo, et al.
Published: (2024)

Generating Synthetic Free-text Medical Records with Low Re-identification Risk using Masked Language Modeling
by: Belkadi, Samuel, et al.
Published: (2024)

Large Language Models for Biomedical Text Simplification: Promising But Not There Yet
by: Li, Zihao, et al.
Published: (2024)

Investigating Large Language Models and Control Mechanisms to Improve Text Readability of Biomedical Abstracts
by: Li, Zihao, et al.
Published: (2023)

A Comparative Study on Automatic Coding of Medical Letters with Explainability
by: Glen, Jamie, et al.
Published: (2024)

CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data
by: Hong, Kung Yin, et al.
Published: (2024)

INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning
by: Romero, Pablo, et al.
Published: (2024)

Structured Information Matters: Explainable ICD Coding with Patient-Level Knowledge Graphs
by: Li, Mingyang, et al.
Published: (2025)

Neural Machine Translation of Clinical Text: An Empirical Investigation into Multilingual Pre-Trained Language Models and Transfer-Learning
by: Han, Lifeng, et al.
Published: (2023)

CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation
by: Hong, Kung Yin, et al.
Published: (2024)

MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs
by: Gladkoff, Serge, et al.
Published: (2023)

DeIDClinic: A Risk-Aware Pseudonymization Framework for Clinical Text De-identification and Re-identification Risk Assessment
by: Paul, Angel, et al.
Published: (2024)

Towards a resource for multilingual lexicons: an MT assisted and human-in-the-loop multilingual parallel corpus with multi-word expression annotation
by: Han, Lifeng, et al.
Published: (2020)

AutoLLM-CARD: Towards a Description and Landscape of Large Language Models
by: Tian, Shengwei, et al.
Published: (2024)

Generation of Synthetic Clinical Text: A Systematic Review
by: Alshaikhdeeb, Basel, et al.
Published: (2025)

Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails
by: Cheng, Kellen Tan, et al.
Published: (2025)

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes
by: Kweon, Sunjun, et al.
Published: (2023)

DualAlign: Generating Clinically Grounded Synthetic Data
by: Li, Rumeng, et al.
Published: (2025)

De-identification of clinical free text using natural language processing: A systematic review of current approaches
by: Kovačević, Aleksandar, et al.
Published: (2023)

Retrieval-Augmented Generation Systems for Intellectual Property via Synthetic Multi-Angle Fine-tuning
by: Ren, Runtao, et al.
Published: (2025)

A Typology of Synthetic Datasets for Dialogue Processing in Clinical Contexts
by: Bedrick, Steven, et al.
Published: (2025)

MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions
by: Mandal, Aishik, et al.
Published: (2025)

MaLei at the PLABA Track of TREC 2024: RoBERTa for Term Replacement -- LLaMA3.1 and GPT-4o for Complete Abstract Adaptation
by: Ling, Zhidong, et al.
Published: (2024)

Parameterized Synthetic Text Generation with SimpleStories
by: Finke, Lennart, et al.
Published: (2025)

CircuitSynth: Reliable Synthetic Data Generation
by: Cheng, Zehua, et al.
Published: (2026)

Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM
by: Das, Trisha, et al.
Published: (2024)

Enhancing Clinical Documentation with Synthetic Data: Leveraging Generative Models for Improved Accuracy
by: Biswas, Anjanava, et al.
Published: (2024)

Synthetic Multimodal Question Generation
by: Wu, Ian, et al.
Published: (2024)

Few-shot LLM Synthetic Data with Distribution Matching
by: Ren, Jiyuan, et al.
Published: (2025)

An Empirical Study of Validating Synthetic Data for Formula Generation
by: Singh, Usneek, et al.
Published: (2024)

Synthetic Dialogue Dataset Generation using LLM Agents
by: Abdullin, Yelaman, et al.
Published: (2024)

Generating Synthetic Datasets for Few-shot Prompt Tuning
by: Guo, Xu, et al.
Published: (2024)

Synthetic bootstrapped pretraining
by: Yang, Zitong, et al.
Published: (2025)

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
by: Lu, Zimu, et al.
Published: (2024)

SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
by: Mishra, Prakamya, et al.
Published: (2024)

Controlled Generation for Private Synthetic Text
by: Zhao, Zihao, et al.
Published: (2025)

Style Transfer as Bias Mitigation: Diffusion Models for Synthetic Mental Health Text for Arabic
by: Mankarious, Saad, et al.
Published: (2026)

Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale
by: Liu, Jinghui, et al.
Published: (2026)

Multi-Document Grounded Multi-Turn Synthetic Dialog Generation
by: Lee, Young-Suk, et al.
Published: (2024)

Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
by: Fan, Zhiting, et al.
Published: (2026)