:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tomar, Manan, Hansen-Estruch, Philippe, Bachman, Philip, Lamb, Alex, Langford, John, Taylor, Matthew E., Levine, Sergey
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.09533
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Unified Auto-Encoding with Masked Diffusion
by: Hansen-Estruch, Philippe, et al.
Published: (2024)

Towards Principled Representation Learning from Videos for Reinforcement Learning
by: Misra, Dipendra, et al.
Published: (2024)

Apollo: An Exploration of Video Understanding in Large Multimodal Models
by: Zohar, Orr, et al.
Published: (2024)

Unified Text-Image Generation with Weakness-Targeted Post-Training
by: Chen, Jiahui, et al.
Published: (2026)

Fast Occupancy Network
by: Lu, Mingjie, et al.
Published: (2024)

Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)

ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters
by: Hansen-Estruch, Philippe, et al.
Published: (2026)

Phi-4-reasoning-vision-15B Technical Report
by: Aneja, Jyoti, et al.
Published: (2026)

Social LSTM with Dynamic Occupancy Modeling for Realistic Pedestrian Trajectory Prediction
by: Alia, Ahmed, et al.
Published: (2025)

Training Diffusion Models with Reinforcement Learning
by: Black, Kevin, et al.
Published: (2023)

Humanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid Robots
by: Cui, Wei, et al.
Published: (2025)

CRUNet-MR-Univ: A Foundation Model for Diverse Cardiac MRI Reconstruction
by: Lyu, Donghang, et al.
Published: (2026)

Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification
by: Shah, Manan, et al.
Published: (2024)

FloodVision: Urban Flood Depth Estimation Using Foundation Vision-Language Models and Domain Knowledge Graph
by: Liu, Zhangding, et al.
Published: (2025)

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
by: Wang, Lening, et al.
Published: (2024)

Deep Radar Inverse Sensor Models for Dynamic Occupancy Grid Maps
by: Wei, Zihang, et al.
Published: (2023)

OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving
by: Shen, Yedong, et al.
Published: (2025)

ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory
by: Li, Ying, et al.
Published: (2025)

Learning Additively Compositional Latent Actions for Embodied AI
by: Wei, Hangxing, et al.
Published: (2026)

OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment
by: Murhij, Youshaa, et al.
Published: (2024)

OccSim: Multi-kilometer Simulation with Long-horizon Occupancy World Models
by: Liu, Tianran, et al.
Published: (2026)

EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
by: Li, Runjia, et al.
Published: (2025)

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
by: Chen, Dubing, et al.
Published: (2025)

KP-INR: A Dual-Branch Implicit Neural Representation Model for Cardiac Cine MRI Reconstruction
by: Lyu, Donghang, et al.
Published: (2025)

Video Motion Transfer with Diffusion Transformers
by: Pondaven, Alexander, et al.
Published: (2024)

Interpreting Physics in Video World Models
by: Joseph, Sonia, et al.
Published: (2026)

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
by: Chen, Chen, et al.
Published: (2025)

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction
by: Ye, Zhangchen, et al.
Published: (2024)

Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
by: Yan, Chi, et al.
Published: (2025)

Revisit Human-Scene Interaction via Space Occupancy
by: Liu, Xinpeng, et al.
Published: (2023)

Multi-Label Classification Framework for Hurricane Damage Assessment
by: Liu, Zhangding, et al.
Published: (2025)

MCANet: A Multi-Scale Class-Specific Attention Network for Multi-Label Post-Hurricane Damage Assessment using UAV Imagery
by: Liu, Zhangding, et al.
Published: (2025)

SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries
by: Dang, Chenxu, et al.
Published: (2025)

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
by: Huang, Yuanhui, et al.
Published: (2024)

GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting
by: Sun, Qianpu, et al.
Published: (2024)

Gamified crowd-sourcing of high-quality data for visual fine-tuning
by: Yadav, Shashank, et al.
Published: (2024)

Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
by: Luz, Maximilian, et al.
Published: (2026)

QYOLO: Lightweight Object Detection via Quantum Inspired Shared Channel Mixing
by: Mittal, Garvit Kumar, et al.
Published: (2026)

SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning
by: Dang, Chenxu, et al.
Published: (2026)

3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
by: Oh, Gyeongrok, et al.
Published: (2025)