Saved in:
| Main Authors: | Hsieh, Yu-Guan, Thornton, James, Ndiaye, Eugene, Klein, Michal, Cuturi, Marco, Ablin, Pierre |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.02998 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
by: Kirchhof, Michael, et al.
Published: (2024)
by: Kirchhof, Michael, et al.
Published: (2024)
Learning Elastic Costs to Shape Monge Displacements
by: Klein, Michal, et al.
Published: (2023)
by: Klein, Michal, et al.
Published: (2023)
The Geometries of Truth Are Orthogonal Across Tasks
by: Azizian, Waiss, et al.
Published: (2025)
by: Azizian, Waiss, et al.
Published: (2025)
Nectar: Neural Estimation of Cached-Token Attention via Regression
by: Monteiro, João, et al.
Published: (2026)
by: Monteiro, João, et al.
Published: (2026)
Multivariate Conformal Prediction using Optimal Transport
by: Klein, Michal, et al.
Published: (2025)
by: Klein, Michal, et al.
Published: (2025)
Simple ReFlow: Improved Techniques for Fast Flow Models
by: Kim, Beomsu, et al.
Published: (2024)
by: Kim, Beomsu, et al.
Published: (2024)
Contrasting Multiple Representations with the Multi-Marginal Matching Gap
by: Piran, Zoe, et al.
Published: (2024)
by: Piran, Zoe, et al.
Published: (2024)
The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
by: Saada, Thiziri Nait, et al.
Published: (2025)
by: Saada, Thiziri Nait, et al.
Published: (2025)
Locking Pretrained Weights via Deep Low-Rank Residual Distillation
by: Sakamoto, Keitaro, et al.
Published: (2026)
by: Sakamoto, Keitaro, et al.
Published: (2026)
Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration
by: Mlodozeniec, Bruno, et al.
Published: (2025)
by: Mlodozeniec, Bruno, et al.
Published: (2025)
Progressive Entropic Optimal Transport Solvers
by: Kassraie, Parnian, et al.
Published: (2024)
by: Kassraie, Parnian, et al.
Published: (2024)
DynaMiCS: Fine-tuning LLMs with Performance Constraints using Dynamic Mixtures
by: Gualdoni, Eleonora, et al.
Published: (2026)
by: Gualdoni, Eleonora, et al.
Published: (2026)
Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection
by: Bethune, Louis, et al.
Published: (2025)
by: Bethune, Louis, et al.
Published: (2025)
Scaling Categorical Flow Maps
by: Davis, Oscar, et al.
Published: (2026)
by: Davis, Oscar, et al.
Published: (2026)
On Fitting Flow Models with Large Sinkhorn Couplings
by: Zhang, Stephen, et al.
Published: (2025)
by: Zhang, Stephen, et al.
Published: (2025)
Amortizing Maximum Inner Product Search with Learned Support Functions
by: Olausson, Theo X., et al.
Published: (2026)
by: Olausson, Theo X., et al.
Published: (2026)
Flow Matching with Semidiscrete Couplings
by: Mousavi-Hosseini, Alireza, et al.
Published: (2025)
by: Mousavi-Hosseini, Alireza, et al.
Published: (2025)
Dynamic Gradient Alignment for Online Data Mixing
by: Fan, Simin, et al.
Published: (2024)
by: Fan, Simin, et al.
Published: (2024)
GENOT: Entropic (Gromov) Wasserstein Flow Matching with Applications to Single-Cell Genomics
by: Klein, Dominik, et al.
Published: (2023)
by: Klein, Dominik, et al.
Published: (2023)
Learning Unmasking Policies for Diffusion Language Models
by: Jazbec, Metod, et al.
Published: (2025)
by: Jazbec, Metod, et al.
Published: (2025)
Beyond Uncertainty Sets: Leveraging Optimal Transport to Extend Conformal Predictive Distribution to Multivariate Settings
by: Ndiaye, Eugene
Published: (2025)
by: Ndiaye, Eugene
Published: (2025)
Omega: Optimistic EMA Gradients
by: Ramirez, Juan, et al.
Published: (2023)
by: Ramirez, Juan, et al.
Published: (2023)
On a Neural Implementation of Brenier's Polar Factorization
by: Vesseron, Nina, et al.
Published: (2024)
by: Vesseron, Nina, et al.
Published: (2024)
EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL
by: Zhang, Lunjun, et al.
Published: (2026)
by: Zhang, Lunjun, et al.
Published: (2026)
The AdEMAMix Optimizer: Better, Faster, Older
by: Pagliardini, Matteo, et al.
Published: (2024)
by: Pagliardini, Matteo, et al.
Published: (2024)
How Smooth Is Attention?
by: Castin, Valérie, et al.
Published: (2023)
by: Castin, Valérie, et al.
Published: (2023)
Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures
by: Vesseron, Nina, et al.
Published: (2025)
by: Vesseron, Nina, et al.
Published: (2025)
From Conformal Predictions to Confidence Regions
by: Guille-Escuret, Charles, et al.
Published: (2024)
by: Guille-Escuret, Charles, et al.
Published: (2024)
Finite Sample Confidence Regions for Linear Regression Parameters Using Arbitrary Predictors
by: Guille-Escuret, Charles, et al.
Published: (2024)
by: Guille-Escuret, Charles, et al.
Published: (2024)
Exact and Approximate Conformal Inference for Multi-Output Regression
by: Johnstone, Chancellor, et al.
Published: (2022)
by: Johnstone, Chancellor, et al.
Published: (2022)
Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization
by: Ye, Zhenzhang, et al.
Published: (2024)
by: Ye, Zhenzhang, et al.
Published: (2024)
MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations
by: Heurtebise, Ambroise, et al.
Published: (2025)
by: Heurtebise, Ambroise, et al.
Published: (2025)
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport
by: Lin, Tianyi, et al.
Published: (2023)
by: Lin, Tianyi, et al.
Published: (2023)
The Coupling Within: Flow Matching via Distilled Normalizing Flows
by: Berthelot, David, et al.
Published: (2026)
by: Berthelot, David, et al.
Published: (2026)
GS-EMA: Integrating Gradient Surgery Exponential Moving Average with Boundary-Aware Contrastive Learning for Enhanced Domain Generalization in Aneurysm Segmentation
by: Lin, Fengming, et al.
Published: (2024)
by: Lin, Fengming, et al.
Published: (2024)
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
by: Grangier, David, et al.
Published: (2024)
by: Grangier, David, et al.
Published: (2024)
Need a Small Specialized Language Model? Plan Early!
by: Grangier, David, et al.
Published: (2024)
by: Grangier, David, et al.
Published: (2024)
Scaling Laws for Mixture Pretraining Under Data Constraints
by: Sedova, Anastasiia, et al.
Published: (2026)
by: Sedova, Anastasiia, et al.
Published: (2026)
A framework for bilevel optimization that enables stochastic and global variance reduction algorithms
by: Dagréou, Mathieu, et al.
Published: (2022)
by: Dagréou, Mathieu, et al.
Published: (2022)
A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization
by: Dagréou, Mathieu, et al.
Published: (2023)
by: Dagréou, Mathieu, et al.
Published: (2023)
Similar Items
-
Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
by: Kirchhof, Michael, et al.
Published: (2024) -
Learning Elastic Costs to Shape Monge Displacements
by: Klein, Michal, et al.
Published: (2023) -
The Geometries of Truth Are Orthogonal Across Tasks
by: Azizian, Waiss, et al.
Published: (2025) -
Nectar: Neural Estimation of Cached-Token Attention via Regression
by: Monteiro, João, et al.
Published: (2026) -
Multivariate Conformal Prediction using Optimal Transport
by: Klein, Michal, et al.
Published: (2025)