:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Habibian, Amirhossein, Ghodrati, Amir, Fathima, Noor, Sautiere, Guillaume, Garrepalli, Risheek, Porikli, Fatih, Petersen, Jens
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2312.08128
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MoViE: Mobile Diffusion for Video Editing
by: Karjauv, Adil, et al.
Published: (2024)

DDIL: Diversity Enhancing Diffusion Distillation With Imitation Learning
by: Garrepalli, Risheek, et al.
Published: (2024)

Multi-Scale Local Speculative Decoding for Image Generation
by: Peruzzo, Elia, et al.
Published: (2026)

MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing
by: Kadambi, Shreya, et al.
Published: (2025)

MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation
by: Yasarla, Rajeev, et al.
Published: (2023)

SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations
by: Lin, Jamie Menjay, et al.
Published: (2024)

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
by: Jeong, Jisoo, et al.
Published: (2024)

Object-Centric Diffusion for Efficient Video Editing
by: Kahatapitiya, Kumara, et al.
Published: (2024)

Neodragon: Mobile Video Generation using Diffusion Transformer
by: Karnewar, Animesh, et al.
Published: (2025)

Mobile Video Diffusion
by: Yahia, Haitam Ben, et al.
Published: (2024)

Distilling Multi-modal Large Language Models for Autonomous Driving
by: Hegde, Deepti, et al.
Published: (2025)

MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans
by: Borse, Shubhankar, et al.
Published: (2025)

Controllable 3D Placement of Objects with Scene-Aware Diffusion Models
by: Omran, Mohamed, et al.
Published: (2025)

Scene-Aware Location Modeling for Data Augmentation in Automotive Object Detection
by: Petersen, Jens, et al.
Published: (2025)

FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
by: Yasarla, Rajeev, et al.
Published: (2024)

ReHyAt: Recurrent Hybrid Attention for Video Diffusion Transformers
by: Ghafoorian, Mohsen, et al.
Published: (2026)

Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer
by: Ghafoorian, Mohsen, et al.
Published: (2025)

RoCA: Robust Cross-Domain End-to-End Autonomous Driving
by: Yasarla, Rajeev, et al.
Published: (2025)

FouRA: Fourier Low Rank Adaptation
by: Borse, Shubhankar, et al.
Published: (2024)

Hybrid Gaussian Splatting for Novel Urban View Synthesis
by: Omran, Mohamed, et al.
Published: (2025)

Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs
by: Chen, Yanlong, et al.
Published: (2026)

Generative Scenario Rollouts for End-to-End Autonomous Driving
by: Yasarla, Rajeev, et al.
Published: (2026)

Enhancing Novel View Synthesis via Geometry Grounded Set Diffusion
by: Zanjani, Farhad G., et al.
Published: (2026)

Gaussian Splatting is an Effective Data Generator for 3D Object Detection
by: Zanjani, Farhad G., et al.
Published: (2025)

Segmentation-Free Guidance for Text-to-Image Diffusion Models
by: Azarian, Kambiz, et al.
Published: (2024)

PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference
by: Korzhenkov, Denis, et al.
Published: (2026)

Low-Latency Neural Stereo Streaming
by: Hou, Qiqi, et al.
Published: (2024)

HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation
by: Mercier, Antoine, et al.
Published: (2024)

MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models
by: Bhowmik, Aritra, et al.
Published: (2025)

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers
by: Li, Zhuojin, et al.
Published: (2026)

CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
by: Lee, Jungsoo, et al.
Published: (2025)

H3O: Hyper-Efficient 3D Occupancy Prediction with Heterogeneous Supervision
by: Shi, Yunxiao, et al.
Published: (2025)

Imagining the Unseen: Generative Location Modeling for Object Placement
by: Yun, Jooyeol, et al.
Published: (2024)

Resolving the Identity Crisis in Text-to-Image Generation
by: Borse, Shubhankar, et al.
Published: (2025)

DySS: Dynamic Queries and State-Space Learning for Efficient 3D Object Detection from Multi-Camera Videos
by: Yasarla, Rajeev, et al.
Published: (2025)

LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation
by: Farhadzadeh, Farzad, et al.
Published: (2025)

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting
by: Lin, Min-Hui, et al.
Published: (2024)

ReDiF: Reinforced Distillation for Few Step Diffusion
by: Tighkhorshid, Amirhossein, et al.
Published: (2025)

ToSA: Token Selective Attention for Efficient Vision Transformers
by: Singh, Manish Kumar, et al.
Published: (2024)

Attention Guided Alignment in Efficient Vision-Language Models
by: Mahajan, Shweta, et al.
Published: (2025)