:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Roque, Matthew Theodore, Velasco, Dan John
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.24356
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scaling, Simplification, and Adaptation: Lessons from Pretraining on Machine-Translated Text
by: Velasco, Dan John, et al.
Published: (2025)

Rethinking the Role of Text Complexity in Language Model Pretraining
by: Velasco, Dan John, et al.
Published: (2025)

Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning
by: Zhang, Yang, et al.
Published: (2025)

Edit-Constrained Decoding for Sentence Simplification
by: Zetsu, Tatsuya, et al.
Published: (2024)

Large Language Models for Biomedical Text Simplification: Promising But Not There Yet
by: Li, Zihao, et al.
Published: (2024)

Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data
by: Zhang, Xuemiao, et al.
Published: (2025)

Difficulty Estimation and Simplification of French Text Using LLMs
by: Jamet, Henri, et al.
Published: (2024)

MultiLS: A Multi-task Lexical Simplification Framework
by: North, Kai, et al.
Published: (2024)

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
by: Luo, Kairong, et al.
Published: (2025)

Text and Audio Simplification: Human vs. ChatGPT
by: Leroy, Gondy, et al.
Published: (2024)

MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator
by: Roscan, Rares-Alexandru, et al.
Published: (2026)

A Knowledge-Injected Curriculum Pretraining Framework for Question Answering
by: Lin, Xin, et al.
Published: (2024)

SimplifyMyText: An LLM-Based System for Inclusive Plain Language Text Simplification
by: Färber, Michael, et al.
Published: (2025)

Automated Feedback Loops to Protect Text Simplification with Generative AI from Information Loss
by: Nandiraju, Abhay Kumara Sri Krishna, et al.
Published: (2025)

Large Language Models as Quasi-crystals: Coherence Without Repetition in Generative Text
by: Guevara-Vela, Jose Manuel
Published: (2025)

Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement Learning
by: Rahman, Md Mushfiqur, et al.
Published: (2024)

Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities
by: Gao, Yingqiang, et al.
Published: (2025)

Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition
by: Tian, Chang, et al.
Published: (2024)

Beyond Line-Level Filtering for the Pretraining Corpora of LLMs
by: Park, Chanwoo, et al.
Published: (2025)

TACL: Threshold-Adaptive Curriculum Learning Strategy for Enhancing Medical Text Understanding
by: Ren, Mucheng, et al.
Published: (2025)

A Severity-Based Curriculum Learning Strategy for Arabic Medical Text Generation
by: Alansary, Ahmed, et al.
Published: (2026)

Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation
by: Dai, Huangyu, et al.
Published: (2024)

A Neural Model for Word Repetition
by: Dager, Daniel, et al.
Published: (2025)

APIO: Automatic Prompt Induction and Optimization for Grammatical Error Correction and Text Simplification
by: Chernodub, Artem, et al.
Published: (2025)

The Overlooked Repetitive Lengthening Form in Sentiment Analysis
by: Wang, Lei, et al.
Published: (2026)

Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
by: Aynetdinov, Ansar, et al.
Published: (2026)

Aligning Sentence Simplification with ESL Learner's Proficiency for Language Acquisition
by: Li, Guanlin, et al.
Published: (2025)

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification
by: North, Kai, et al.
Published: (2022)

Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL
by: Wang, Yihan, et al.
Published: (2026)

Curriculum Learning with Quality-Driven Data Selection
by: Wu, Biao, et al.
Published: (2024)

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web
by: Yan, Yibo, et al.
Published: (2023)

Do Repetitions Matter? Strengthening Reliability in LLM Evaluations
by: Gonzalez, Miguel Angel Alvarado, et al.
Published: (2025)

Dynamic Masking Rate Schedules for MLM Pretraining
by: Ankner, Zachary, et al.
Published: (2023)

Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings
by: Gao, Lingyu
Published: (2024)

Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
by: Feng, Steven, et al.
Published: (2024)

In-context Pretraining: Language Modeling Beyond Document Boundaries
by: Shi, Weijia, et al.
Published: (2023)

On Linear Representations and Pretraining Data Frequency in Language Models
by: Merullo, Jack, et al.
Published: (2025)

Optimizing Pretraining Data Mixtures with LLM-Estimated Utility
by: Held, William, et al.
Published: (2025)

Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction
by: Smirnov, Alexander
Published: (2026)

Looks can be Deceptive: Distinguishing Repetition Disfluency from Reduplication
by: Ahmad, Arif, et al.
Published: (2024)