:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Korchinski, Daniel J., Favero, Alessandro, Wyart, Matthieu
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2605.27734
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Bigger Isn't Always Memorizing: Early Stopping Overparameterized Diffusion Models
von: Favero, Alessandro, et al.
Veröffentlicht: (2025)

On the Emergence of Linear Analogies in Word Embeddings
von: Korchinski, Daniel J., et al.
Veröffentlicht: (2025)

A Phase Transition in Diffusion Models Reveals the Hierarchical Nature of Data
von: Sclocchi, Antonio, et al.
Veröffentlicht: (2024)

Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
von: Cagnetta, Francesco, et al.
Veröffentlicht: (2025)

Symmetry in language statistics shapes the geometry of model representations
von: Karkada, Dhruva, et al.
Veröffentlicht: (2026)

Sampling Data with Chains of Forward-Backward Diffusion Steps
von: Kang, Hyunmo, et al.
Veröffentlicht: (2026)

How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
von: Favero, Alessandro, et al.
Veröffentlicht: (2025)

Probing the Latent Hierarchical Structure of Data via Diffusion Models
von: Sclocchi, Antonio, et al.
Veröffentlicht: (2024)

How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model
von: Cagnetta, Francesco, et al.
Veröffentlicht: (2023)

Towards a theory of how the structure of language is acquired by deep neural networks
von: Cagnetta, Francesco, et al.
Veröffentlicht: (2024)

Learning curves theory for hierarchically compositional data with power-law distributed features
von: Cagnetta, Francesco, et al.
Veröffentlicht: (2025)

Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence
von: Nava, Andres, et al.
Veröffentlicht: (2026)

How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
von: Tomasini, Umberto, et al.
Veröffentlicht: (2024)

On the different regimes of Stochastic Gradient Descent
von: Sclocchi, Antonio, et al.
Veröffentlicht: (2023)

Microscopic description of the intermittent dynamics driving logarithmic creep
von: Korchinski, Daniel J., et al.
Veröffentlicht: (2024)

Deep networks learn to parse uniform-depth context-free languages from local statistics
von: Parley, Jack T., et al.
Veröffentlicht: (2026)

Deriving Neural Scaling Laws from the statistics of natural language
von: Cagnetta, Francesco, et al.
Veröffentlicht: (2026)

The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning
von: Favero, Alessandro
Veröffentlicht: (2025)

Diffusion Models Preferentially Memorize Prototypical Examples or: Why Does My Diffusion Model Love Slop?
von: Rodriguez, Marta Aparicio, et al.
Veröffentlicht: (2026)

Unified Latents (UL): How to train your latents
von: Heek, Jonathan, et al.
Veröffentlicht: (2026)

Task Addition and Weight Disentanglement in Closed-Vocabulary Models
von: Hazimeh, Adam, et al.
Veröffentlicht: (2025)

Deep graph matching meets mixed-integer linear programming: Relax at your own risk ?
von: Xu, Zhoubo, et al.
Veröffentlicht: (2021)

Not all tokens are needed(NAT): token efficient reinforcement learning
von: Sang, Hejian, et al.
Veröffentlicht: (2026)

Efficient numeracy in language models through single-token number embeddings
von: Kreitner, Linus, et al.
Veröffentlicht: (2025)

Km-scale dynamical downscaling through conformalized latent diffusion models
von: Brusaferri, Alessandro, et al.
Veröffentlicht: (2025)

Scaling FP8 training to trillion-token LLMs
von: Fishman, Maxim, et al.
Veröffentlicht: (2024)

Looking beyond the next token
von: Thankaraj, Abitha, et al.
Veröffentlicht: (2025)

Where is the signal in tokenization space?
von: Geh, Renato Lui, et al.
Veröffentlicht: (2024)

Physics in Next-token Prediction
von: An, Hongjun, et al.
Veröffentlicht: (2024)

Hierarchical self-assembly for high-yield addressable complexity at fixed conditions
von: Holmes-Cerfon, Miranda, et al.
Veröffentlicht: (2025)

Unified token representations for sequential decision models
von: Tian, Zhuojing, et al.
Veröffentlicht: (2025)

Backdoor Unlearning by Linear Task Decomposition
von: Abdelraheem, Amel, et al.
Veröffentlicht: (2025)

The pitfalls of next-token prediction
von: Bachmann, Gregor, et al.
Veröffentlicht: (2024)

MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs
von: Wang, Ke, et al.
Veröffentlicht: (2025)

Graph2text or Graph2token: A Perspective of Large Language Models for Graph Learning
von: Yu, Shuo, et al.
Veröffentlicht: (2025)

Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models
von: Meeus, Matthieu, et al.
Veröffentlicht: (2023)

Next-token pretraining implies in-context learning
von: Riechers, Paul M., et al.
Veröffentlicht: (2025)

On multi-token prediction for efficient LLM inference
von: Mehra, Somesh, et al.
Veröffentlicht: (2025)

On the Stability of Iterative Retraining of Generative Models on their own Data
von: Bertrand, Quentin, et al.
Veröffentlicht: (2023)

Non-asymptotic Convergence of Training Transformers for Next-token Prediction
von: Huang, Ruiquan, et al.
Veröffentlicht: (2024)