:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, William Yicheng, Ye, Keren, Ke, Junjie, Yu, Jiahui, Guibas, Leonidas, Milanfar, Peyman, Yang, Feng
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.04102
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
by: Huang, Ian, et al.
Published: (2024)

Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration
by: Delbracio, Mauricio, et al.
Published: (2023)

Denoising: A Powerful Building-Block for Imaging, Inverse Problems, and Machine Learning
by: Milanfar, Peyman, et al.
Published: (2024)

UniRes: Universal Image Restoration for Complex Degradations
by: Zhou, Mo, et al.
Published: (2025)

TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance
by: Ye, Keren, et al.
Published: (2025)

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
by: Chen, Boyuan, et al.
Published: (2024)

SPIRE: Semantic Prompt-Driven Image Restoration
by: Qi, Chenyang, et al.
Published: (2023)

VLM-PAR: A Vision Language Model for Pedestrian Attribute Recognition
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning
by: Sahraee-Ardakan, Mojtaba, et al.
Published: (2026)

Reference-Guided Identity Preserving Face Restoration
by: Zhou, Mo, et al.
Published: (2025)

MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps
by: Lei, Jiahui, et al.
Published: (2025)

MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds
by: Lei, Jiahui, et al.
Published: (2024)

SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes
by: Maillard, Léopold, et al.
Published: (2026)

On the Relation Between Linear Diffusion and Power Iteration
by: Weitzner, Dana, et al.
Published: (2024)

RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation
by: Kuang, Yuxuan, et al.
Published: (2024)

InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping
by: Zhang, Yunchao, et al.
Published: (2024)

PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers
by: Li, Songlin, et al.
Published: (2024)

High Perceptual Quality Image Denoising with a Posterior Sampling CGAN
by: Ohayon, Guy, et al.
Published: (2021)

OCH3R: Object-Centric Holistic 3D Reconstruction
by: Du, Yi, et al.
Published: (2026)

PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning
by: Li, Haoyang, et al.
Published: (2026)

Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
by: Lee, Phillip Y., et al.
Published: (2025)

ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Field
by: Nakayama, Kiyohiro, et al.
Published: (2024)

Support-Set Context Matters for Bongard Problems
by: Raghuraman, Nikhil, et al.
Published: (2023)

Dynamic Reflections: Probing Video Representations with Text Alignment
by: Zhu, Tyler, et al.
Published: (2025)

Zero-Shot Image Feature Consensus with Deep Functional Maps
by: Cheng, Xinle, et al.
Published: (2024)

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
by: Tian, Xiaoyu, et al.
Published: (2024)

Refining Pre-Trained Motion Models
by: Sun, Xinglong, et al.
Published: (2024)

Stochastic Deep Restoration Priors for Imaging Inverse Problems
by: Hu, Yuyang, et al.
Published: (2024)

Global Motion Corresponder for 3D Point-Based Scene Interpolation under Large Motion
by: Lin, Junru, et al.
Published: (2025)

NeRF Revisited: Fixing Quadrature Instability in Volume Rendering
by: Uy, Mikaela Angelina, et al.
Published: (2023)

Attribute-based Visual Reprogramming for Vision-Language Models
by: Cai, Chengyi, et al.
Published: (2025)

BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing
by: Gu, Yunqi, et al.
Published: (2025)

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
by: You, Yang, et al.
Published: (2024)

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization
by: You, Yang, et al.
Published: (2024)

Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation
by: He, Yiguo, et al.
Published: (2025)

Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks
by: Fan, Yucheng, et al.
Published: (2025)

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment
by: Cong, Wenyan, et al.
Published: (2025)

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
by: Mei, Kangfu, et al.
Published: (2024)

The Power of Context: How Multimodality Improves Image Super-Resolution
by: Mei, Kangfu, et al.
Published: (2025)

Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration
by: Hu, Yuyang, et al.
Published: (2025)