:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jacobs, Niklas, Voelkle, Manuel C., Kathmann, Norbert, Hilbert, Kevin
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.06159
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Embedding And Clustering Your Data Can Improve Contrastive Pretraining
by: Merrick, Luke
Published: (2024)

Data-Efficient Sleep Staging with Synthetic Time Series Pretraining
by: Grieger, Niklas, et al.
Published: (2024)

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?
by: Blum, Avrim, et al.
Published: (2019)

Multitask Learning Can Improve Worst-Group Outcomes
by: Kulkarni, Atharva, et al.
Published: (2023)

Improving Pretraining Data Using Perplexity Correlations
by: Thrush, Tristan, et al.
Published: (2024)

Transformers Can Navigate Mazes With Multi-Step Prediction
by: Nolte, Niklas, et al.
Published: (2024)

Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
by: Ball, Sarah, et al.
Published: (2025)

Improving Cardiac Risk Prediction Using Data Generation Techniques
by: Cabodevila, Alexandre, et al.
Published: (2025)

Identifying and Evaluating Inactive Heads in Pretrained LLMs
by: Sandoval-Segura, Pedro, et al.
Published: (2025)

On the Importance of Pretraining Data Alignment for Atomic Property Prediction
by: Ghunaim, Yasir, et al.
Published: (2025)

Pretraining on Sleep Data Improves non-Sleep Biosignal Tasks
by: Lehn-Schiøler, William, et al.
Published: (2026)

Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation
by: Levy, Jacob, et al.
Published: (2026)

Improving Insurance Catastrophic Data with Resampling and GAN Methods
by: Dzadz, Norbert, et al.
Published: (2024)

Can Muon Fine-tune Adam-Pretrained Models?
by: Qu, Xingyu, et al.
Published: (2026)

DataDecide: How to Predict Best Pretraining Data with Small Experiments
by: Magnusson, Ian, et al.
Published: (2025)

Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction
by: Al-Batah, Mohammad Subhi, et al.
Published: (2025)

Debiased Machine Learning for Conformal Prediction of Counterfactual Outcomes Under Runtime Confounding
by: Barnatchez, Keith, et al.
Published: (2026)

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
by: Xie, Shifeng, et al.
Published: (2025)

Language Models Improve When Pretraining Data Matches Target Tasks
by: Mizrahi, David, et al.
Published: (2025)

Data-Centric Lessons To Improve Speech-Language Pretraining
by: Udandarao, Vishaal, et al.
Published: (2025)

Can AI Improve Outcomes in Tele-ICU Settings?
by: Rasit Dinc
Published: (2019)

Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics
by: Hoang, Tai, et al.
Published: (2025)

Can we trust the evaluation on ChatGPT?
by: Aiyappa, Rachith, et al.
Published: (2023)

Is CLIP ideal? No. Can we fix it? Yes!
by: Kang, Raphi, et al.
Published: (2025)

Two-Stage Pretraining for Molecular Property Prediction in the Wild
by: Wijaya, Kevin Tirta, et al.
Published: (2024)

Predicting Cardiopulmonary Exercise Testing Outcomes in Congenital Heart Disease Through Multi-modal Data Integration and Geometric Learning
by: Alkan, Muhammet, et al.
Published: (2025)

Improving Water Quality Time-Series Prediction in Hong Kong using Sentinel-2 MSI Data and Google Earth Engine Cloud Computing
by: Sood, Rohin, et al.
Published: (2024)

Memorizing Long-tail Data Can Help Generalization Through Composition
by: Zhou, Mo, et al.
Published: (2025)

Can we Soft Prompt LLMs for Graph Learning Tasks?
by: Liu, Zheyuan, et al.
Published: (2024)

Latent Diffusion Pretraining for Crystal Property Prediction
by: Mukherjee, Shrimon, et al.
Published: (2026)

Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization
by: Azevedo, Caio, et al.
Published: (2025)

Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery
by: Dudley, Carson, et al.
Published: (2025)

Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations
by: Quintero, Manuel, et al.
Published: (2025)

Multi-Objective Alignment of Language Models for Personalized Psychotherapy
by: Beikzadeh, Mehrab, et al.
Published: (2026)

Can we generate portable representations for clinical time series data using LLMs?
by: Ji, Zongliang, et al.
Published: (2026)

Can GRPO Help LLMs Transcend Their Pretraining Origin?
by: Ni, Kangqi, et al.
Published: (2025)

Amortized Safe Active Learning for Real-Time Data Acquisition: Pretrained Neural Policies From Simulated Nonparametric Functions
by: Li, Cen-You, et al.
Published: (2025)

Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation
by: Hamidieh, Kimia, et al.
Published: (2024)

Can we hop in general? A discussion of benchmark selection and design using the Hopper environment
by: Voelcker, Claas A, et al.
Published: (2024)

Improving DNS Exfiltration Detection via Transformer Pretraining
by: Tomić, Miloš, et al.
Published: (2026)