Saved in:
| Main Authors: | Jacobs, Niklas, Voelkle, Manuel C., Kathmann, Norbert, Hilbert, Kevin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.06159 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Embedding And Clustering Your Data Can Improve Contrastive Pretraining
by: Merrick, Luke
Published: (2024)
by: Merrick, Luke
Published: (2024)
Data-Efficient Sleep Staging with Synthetic Time Series Pretraining
by: Grieger, Niklas, et al.
Published: (2024)
by: Grieger, Niklas, et al.
Published: (2024)
Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?
by: Blum, Avrim, et al.
Published: (2019)
by: Blum, Avrim, et al.
Published: (2019)
Multitask Learning Can Improve Worst-Group Outcomes
by: Kulkarni, Atharva, et al.
Published: (2023)
by: Kulkarni, Atharva, et al.
Published: (2023)
Improving Pretraining Data Using Perplexity Correlations
by: Thrush, Tristan, et al.
Published: (2024)
by: Thrush, Tristan, et al.
Published: (2024)
Transformers Can Navigate Mazes With Multi-Step Prediction
by: Nolte, Niklas, et al.
Published: (2024)
by: Nolte, Niklas, et al.
Published: (2024)
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
by: Ball, Sarah, et al.
Published: (2025)
by: Ball, Sarah, et al.
Published: (2025)
Improving Cardiac Risk Prediction Using Data Generation Techniques
by: Cabodevila, Alexandre, et al.
Published: (2025)
by: Cabodevila, Alexandre, et al.
Published: (2025)
Identifying and Evaluating Inactive Heads in Pretrained LLMs
by: Sandoval-Segura, Pedro, et al.
Published: (2025)
by: Sandoval-Segura, Pedro, et al.
Published: (2025)
On the Importance of Pretraining Data Alignment for Atomic Property Prediction
by: Ghunaim, Yasir, et al.
Published: (2025)
by: Ghunaim, Yasir, et al.
Published: (2025)
Pretraining on Sleep Data Improves non-Sleep Biosignal Tasks
by: Lehn-Schiøler, William, et al.
Published: (2026)
by: Lehn-Schiøler, William, et al.
Published: (2026)
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation
by: Levy, Jacob, et al.
Published: (2026)
by: Levy, Jacob, et al.
Published: (2026)
Improving Insurance Catastrophic Data with Resampling and GAN Methods
by: Dzadz, Norbert, et al.
Published: (2024)
by: Dzadz, Norbert, et al.
Published: (2024)
Can Muon Fine-tune Adam-Pretrained Models?
by: Qu, Xingyu, et al.
Published: (2026)
by: Qu, Xingyu, et al.
Published: (2026)
DataDecide: How to Predict Best Pretraining Data with Small Experiments
by: Magnusson, Ian, et al.
Published: (2025)
by: Magnusson, Ian, et al.
Published: (2025)
Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction
by: Al-Batah, Mohammad Subhi, et al.
Published: (2025)
by: Al-Batah, Mohammad Subhi, et al.
Published: (2025)
Debiased Machine Learning for Conformal Prediction of Counterfactual Outcomes Under Runtime Confounding
by: Barnatchez, Keith, et al.
Published: (2026)
by: Barnatchez, Keith, et al.
Published: (2026)
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
by: Xie, Shifeng, et al.
Published: (2025)
by: Xie, Shifeng, et al.
Published: (2025)
Language Models Improve When Pretraining Data Matches Target Tasks
by: Mizrahi, David, et al.
Published: (2025)
by: Mizrahi, David, et al.
Published: (2025)
Data-Centric Lessons To Improve Speech-Language Pretraining
by: Udandarao, Vishaal, et al.
Published: (2025)
by: Udandarao, Vishaal, et al.
Published: (2025)
Can AI Improve Outcomes in Tele-ICU Settings?
by: Rasit Dinc
Published: (2019)
by: Rasit Dinc
Published: (2019)
Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics
by: Hoang, Tai, et al.
Published: (2025)
by: Hoang, Tai, et al.
Published: (2025)
Can we trust the evaluation on ChatGPT?
by: Aiyappa, Rachith, et al.
Published: (2023)
by: Aiyappa, Rachith, et al.
Published: (2023)
Is CLIP ideal? No. Can we fix it? Yes!
by: Kang, Raphi, et al.
Published: (2025)
by: Kang, Raphi, et al.
Published: (2025)
Two-Stage Pretraining for Molecular Property Prediction in the Wild
by: Wijaya, Kevin Tirta, et al.
Published: (2024)
by: Wijaya, Kevin Tirta, et al.
Published: (2024)
Predicting Cardiopulmonary Exercise Testing Outcomes in Congenital Heart Disease Through Multi-modal Data Integration and Geometric Learning
by: Alkan, Muhammet, et al.
Published: (2025)
by: Alkan, Muhammet, et al.
Published: (2025)
Improving Water Quality Time-Series Prediction in Hong Kong using Sentinel-2 MSI Data and Google Earth Engine Cloud Computing
by: Sood, Rohin, et al.
Published: (2024)
by: Sood, Rohin, et al.
Published: (2024)
Memorizing Long-tail Data Can Help Generalization Through Composition
by: Zhou, Mo, et al.
Published: (2025)
by: Zhou, Mo, et al.
Published: (2025)
Can we Soft Prompt LLMs for Graph Learning Tasks?
by: Liu, Zheyuan, et al.
Published: (2024)
by: Liu, Zheyuan, et al.
Published: (2024)
Latent Diffusion Pretraining for Crystal Property Prediction
by: Mukherjee, Shrimon, et al.
Published: (2026)
by: Mukherjee, Shrimon, et al.
Published: (2026)
Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization
by: Azevedo, Caio, et al.
Published: (2025)
by: Azevedo, Caio, et al.
Published: (2025)
Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery
by: Dudley, Carson, et al.
Published: (2025)
by: Dudley, Carson, et al.
Published: (2025)
Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations
by: Quintero, Manuel, et al.
Published: (2025)
by: Quintero, Manuel, et al.
Published: (2025)
Multi-Objective Alignment of Language Models for Personalized Psychotherapy
by: Beikzadeh, Mehrab, et al.
Published: (2026)
by: Beikzadeh, Mehrab, et al.
Published: (2026)
Can we generate portable representations for clinical time series data using LLMs?
by: Ji, Zongliang, et al.
Published: (2026)
by: Ji, Zongliang, et al.
Published: (2026)
Can GRPO Help LLMs Transcend Their Pretraining Origin?
by: Ni, Kangqi, et al.
Published: (2025)
by: Ni, Kangqi, et al.
Published: (2025)
Amortized Safe Active Learning for Real-Time Data Acquisition: Pretrained Neural Policies From Simulated Nonparametric Functions
by: Li, Cen-You, et al.
Published: (2025)
by: Li, Cen-You, et al.
Published: (2025)
Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation
by: Hamidieh, Kimia, et al.
Published: (2024)
by: Hamidieh, Kimia, et al.
Published: (2024)
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment
by: Voelcker, Claas A, et al.
Published: (2024)
by: Voelcker, Claas A, et al.
Published: (2024)
Improving DNS Exfiltration Detection via Transformer Pretraining
by: Tomić, Miloš, et al.
Published: (2026)
by: Tomić, Miloš, et al.
Published: (2026)
Similar Items
-
Embedding And Clustering Your Data Can Improve Contrastive Pretraining
by: Merrick, Luke
Published: (2024) -
Data-Efficient Sleep Staging with Synthetic Time Series Pretraining
by: Grieger, Niklas, et al.
Published: (2024) -
Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?
by: Blum, Avrim, et al.
Published: (2019) -
Multitask Learning Can Improve Worst-Group Outcomes
by: Kulkarni, Atharva, et al.
Published: (2023) -
Improving Pretraining Data Using Perplexity Correlations
by: Thrush, Tristan, et al.
Published: (2024)