:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Baldassarre, Federico, Szafraniec, Marc, Terver, Basile, Khalidov, Vasil, Massa, Francisco, LeCun, Yann, Labatut, Patrick, Seitzer, Maximilian, Bojanowski, Piotr
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.19468
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
by: Terver, Basile, et al.
Published: (2025)

Learning Latent Action World Models In The Wild
by: Garrido, Quentin, et al.
Published: (2026)

Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images
by: Raugel, Joséphine, et al.
Published: (2026)

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
by: Zhou, Gaoyue, et al.
Published: (2024)

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
by: Vo, Huy V., et al.
Published: (2024)

Learning by Reconstruction Produces Uninformative Features For Perception
by: Balestriero, Randall, et al.
Published: (2024)

DINOv3
by: Siméoni, Oriane, et al.
Published: (2025)

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
by: Assran, Mido, et al.
Published: (2025)

Hierarchical Planning with Latent World Models
by: Zhang, Wancong, et al.
Published: (2026)

Disentangling the Factors of Convergence between Brains and Computer Vision Models
by: Raugel, Joséphine, et al.
Published: (2025)

Fast and Exact Enumeration of Deep Networks Partitions Regions
by: Balestriero, Randall, et al.
Published: (2024)

Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence
by: Dawid, Anna, et al.
Published: (2023)

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
by: Balestriero, Randall, et al.
Published: (2025)

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
by: Jose, Cijo, et al.
Published: (2024)

Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities
by: Zadaianchuk, Andrii, et al.
Published: (2023)

Video Representation Learning with Joint-Embedding Predictive Architectures
by: Drozdov, Katrina, et al.
Published: (2024)

You Don't Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning
by: Moutakanni, Théo, et al.
Published: (2024)

Cluster and Predict Latent Patches for Improved Masked Image Modeling
by: Darcet, Timothée, et al.
Published: (2025)

Efficient Universal Perception Encoder
by: Zhu, Chenchen, et al.
Published: (2026)

A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures
by: Terver, Basile, et al.
Published: (2026)

LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
by: Huang, Hai, et al.
Published: (2025)

Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA
by: Huang, Hai, et al.
Published: (2026)

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science
by: Dupoux, Emmanuel, et al.
Published: (2026)

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations
by: Mialon, Grégoire, et al.
Published: (2022)

A hierarchical loss and its problems when classifying non-hierarchically
by: Wu, Cinna, et al.
Published: (2017)

DINOv2: Learning Robust Visual Features without Supervision
by: Oquab, Maxime, et al.
Published: (2023)

Navigation World Models
by: Bar, Amir, et al.
Published: (2024)

OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation
by: Goswami, Raktim Gautam, et al.
Published: (2025)

Parallel Stochastic Gradient-Based Planning for World Models
by: Psenka, Michael, et al.
Published: (2026)

Causal-JEPA: Learning World Models through Object-Level Latent Masking
by: Nam, Heejeong, et al.
Published: (2026)

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
by: Maes, Lucas, et al.
Published: (2026)

Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning
by: Moutakanni, Théo, et al.
Published: (2024)

Revisiting Feature Prediction for Learning Visual Representations from Video
by: Bardes, Adrien, et al.
Published: (2024)

URLOST: Unsupervised Representation Learning without Stationarity or Topology
by: Yun, Zeyu, et al.
Published: (2023)

Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density
by: Balestriero, Randall, et al.
Published: (2025)

The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks
by: Sun, Shangwen, et al.
Published: (2026)

Learning and Leveraging World Models in Visual Representation Learning
by: Garrido, Quentin, et al.
Published: (2024)

Hierarchical World Models as Visual Whole-Body Humanoid Controllers
by: Hansen, Nicklas, et al.
Published: (2024)

Whole-Body Conditioned Egocentric Video Prediction
by: Bai, Yutong, et al.
Published: (2025)

PEIRA: Learning Predictive Encoders through Inter-View Regressor Alignment
by: Arbel, Michael, et al.
Published: (2026)