Saved in:
| Main Authors: | Roque, Matthew Theodore, Velasco, Dan John |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.24356 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scaling, Simplification, and Adaptation: Lessons from Pretraining on Machine-Translated Text
by: Velasco, Dan John, et al.
Published: (2025)
by: Velasco, Dan John, et al.
Published: (2025)
Rethinking the Role of Text Complexity in Language Model Pretraining
by: Velasco, Dan John, et al.
Published: (2025)
by: Velasco, Dan John, et al.
Published: (2025)
Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning
by: Zhang, Yang, et al.
Published: (2025)
by: Zhang, Yang, et al.
Published: (2025)
Edit-Constrained Decoding for Sentence Simplification
by: Zetsu, Tatsuya, et al.
Published: (2024)
by: Zetsu, Tatsuya, et al.
Published: (2024)
Large Language Models for Biomedical Text Simplification: Promising But Not There Yet
by: Li, Zihao, et al.
Published: (2024)
by: Li, Zihao, et al.
Published: (2024)
Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data
by: Zhang, Xuemiao, et al.
Published: (2025)
by: Zhang, Xuemiao, et al.
Published: (2025)
Difficulty Estimation and Simplification of French Text Using LLMs
by: Jamet, Henri, et al.
Published: (2024)
by: Jamet, Henri, et al.
Published: (2024)
MultiLS: A Multi-task Lexical Simplification Framework
by: North, Kai, et al.
Published: (2024)
by: North, Kai, et al.
Published: (2024)
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
by: Luo, Kairong, et al.
Published: (2025)
by: Luo, Kairong, et al.
Published: (2025)
Text and Audio Simplification: Human vs. ChatGPT
by: Leroy, Gondy, et al.
Published: (2024)
by: Leroy, Gondy, et al.
Published: (2024)
MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator
by: Roscan, Rares-Alexandru, et al.
Published: (2026)
by: Roscan, Rares-Alexandru, et al.
Published: (2026)
A Knowledge-Injected Curriculum Pretraining Framework for Question Answering
by: Lin, Xin, et al.
Published: (2024)
by: Lin, Xin, et al.
Published: (2024)
SimplifyMyText: An LLM-Based System for Inclusive Plain Language Text Simplification
by: Färber, Michael, et al.
Published: (2025)
by: Färber, Michael, et al.
Published: (2025)
Automated Feedback Loops to Protect Text Simplification with Generative AI from Information Loss
by: Nandiraju, Abhay Kumara Sri Krishna, et al.
Published: (2025)
by: Nandiraju, Abhay Kumara Sri Krishna, et al.
Published: (2025)
Large Language Models as Quasi-crystals: Coherence Without Repetition in Generative Text
by: Guevara-Vela, Jose Manuel
Published: (2025)
by: Guevara-Vela, Jose Manuel
Published: (2025)
Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement Learning
by: Rahman, Md Mushfiqur, et al.
Published: (2024)
by: Rahman, Md Mushfiqur, et al.
Published: (2024)
Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities
by: Gao, Yingqiang, et al.
Published: (2025)
by: Gao, Yingqiang, et al.
Published: (2025)
Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition
by: Tian, Chang, et al.
Published: (2024)
by: Tian, Chang, et al.
Published: (2024)
Beyond Line-Level Filtering for the Pretraining Corpora of LLMs
by: Park, Chanwoo, et al.
Published: (2025)
by: Park, Chanwoo, et al.
Published: (2025)
TACL: Threshold-Adaptive Curriculum Learning Strategy for Enhancing Medical Text Understanding
by: Ren, Mucheng, et al.
Published: (2025)
by: Ren, Mucheng, et al.
Published: (2025)
A Severity-Based Curriculum Learning Strategy for Arabic Medical Text Generation
by: Alansary, Ahmed, et al.
Published: (2026)
by: Alansary, Ahmed, et al.
Published: (2026)
Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation
by: Dai, Huangyu, et al.
Published: (2024)
by: Dai, Huangyu, et al.
Published: (2024)
A Neural Model for Word Repetition
by: Dager, Daniel, et al.
Published: (2025)
by: Dager, Daniel, et al.
Published: (2025)
APIO: Automatic Prompt Induction and Optimization for Grammatical Error Correction and Text Simplification
by: Chernodub, Artem, et al.
Published: (2025)
by: Chernodub, Artem, et al.
Published: (2025)
The Overlooked Repetitive Lengthening Form in Sentiment Analysis
by: Wang, Lei, et al.
Published: (2026)
by: Wang, Lei, et al.
Published: (2026)
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
by: Aynetdinov, Ansar, et al.
Published: (2026)
by: Aynetdinov, Ansar, et al.
Published: (2026)
Aligning Sentence Simplification with ESL Learner's Proficiency for Language Acquisition
by: Li, Guanlin, et al.
Published: (2025)
by: Li, Guanlin, et al.
Published: (2025)
ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification
by: North, Kai, et al.
Published: (2022)
by: North, Kai, et al.
Published: (2022)
Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL
by: Wang, Yihan, et al.
Published: (2026)
by: Wang, Yihan, et al.
Published: (2026)
Curriculum Learning with Quality-Driven Data Selection
by: Wu, Biao, et al.
Published: (2024)
by: Wu, Biao, et al.
Published: (2024)
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web
by: Yan, Yibo, et al.
Published: (2023)
by: Yan, Yibo, et al.
Published: (2023)
Do Repetitions Matter? Strengthening Reliability in LLM Evaluations
by: Gonzalez, Miguel Angel Alvarado, et al.
Published: (2025)
by: Gonzalez, Miguel Angel Alvarado, et al.
Published: (2025)
Dynamic Masking Rate Schedules for MLM Pretraining
by: Ankner, Zachary, et al.
Published: (2023)
by: Ankner, Zachary, et al.
Published: (2023)
Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings
by: Gao, Lingyu
Published: (2024)
by: Gao, Lingyu
Published: (2024)
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
by: Feng, Steven, et al.
Published: (2024)
by: Feng, Steven, et al.
Published: (2024)
In-context Pretraining: Language Modeling Beyond Document Boundaries
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
On Linear Representations and Pretraining Data Frequency in Language Models
by: Merullo, Jack, et al.
Published: (2025)
by: Merullo, Jack, et al.
Published: (2025)
Optimizing Pretraining Data Mixtures with LLM-Estimated Utility
by: Held, William, et al.
Published: (2025)
by: Held, William, et al.
Published: (2025)
Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction
by: Smirnov, Alexander
Published: (2026)
by: Smirnov, Alexander
Published: (2026)
Looks can be Deceptive: Distinguishing Repetition Disfluency from Reduplication
by: Ahmad, Arif, et al.
Published: (2024)
by: Ahmad, Arif, et al.
Published: (2024)
Similar Items
-
Scaling, Simplification, and Adaptation: Lessons from Pretraining on Machine-Translated Text
by: Velasco, Dan John, et al.
Published: (2025) -
Rethinking the Role of Text Complexity in Language Model Pretraining
by: Velasco, Dan John, et al.
Published: (2025) -
Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning
by: Zhang, Yang, et al.
Published: (2025) -
Edit-Constrained Decoding for Sentence Simplification
by: Zetsu, Tatsuya, et al.
Published: (2024) -
Large Language Models for Biomedical Text Simplification: Promising But Not There Yet
by: Li, Zihao, et al.
Published: (2024)