:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Songyan, Sun, Xinyu, Chen, Hao, Li, Bo, Shen, Chunhua
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2310.11755
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction
by: Zhang, Songyan, et al.
Published: (2025)

Digging Into Normal Incorporated Stereo Matching
by: Liu, Zihua, et al.
Published: (2024)

RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image
by: Chen, Xiaoxue, et al.
Published: (2024)

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
by: Liu, Yang, et al.
Published: (2023)

WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model
by: Zhang, Songyan, et al.
Published: (2024)

SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing
by: Chen, Songyan, et al.
Published: (2024)

StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation
by: Liu, Mingyu, et al.
Published: (2025)

Resolve Domain Conflicts for Generalizable Remote Physiological Measurement
by: Sun, Weiyu, et al.
Published: (2024)

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
by: Chu, Xiangxiang, et al.
Published: (2024)

MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching
by: Liu, Yepeng, et al.
Published: (2025)

Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
by: Liu, Mingyu, et al.
Published: (2025)

GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models
by: Ge, Yongtao, et al.
Published: (2024)

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
by: Zhao, Canyu, et al.
Published: (2024)

DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation
by: Liu, Zihua, et al.
Published: (2025)

A CLIP-Powered Framework for Robust and Generalizable Data Selection
by: Yang, Suorong, et al.
Published: (2024)

Learning Efficient and Generalizable Human Representation with Human Gaussian Model
by: Liu, Yifan, et al.
Published: (2025)

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
by: Zhao, Canyu, et al.
Published: (2025)

Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching
by: Sun, Yujing, et al.
Published: (2024)

DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
by: Shi, Chen, et al.
Published: (2025)

Explicit Correspondence Matching for Generalizable Neural Radiance Fields
by: Chen, Yuedong, et al.
Published: (2023)

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
by: Jiang, Hanwen, et al.
Published: (2024)

Diffusion Models are Efficient Data Generators for Human Mesh Recovery
by: Ge, Yongtao, et al.
Published: (2024)

Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
by: Zhao, Canyu, et al.
Published: (2025)

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
by: Chu, Xiangxiang, et al.
Published: (2024)

Frequency Prior Guided Matching: A Data Augmentation Approach for Generalizable Semi-Supervised Polyp Segmentation
by: Xi, Haoran, et al.
Published: (2025)

Unlocking the Power of Critical Factors for 3D Visual Geometry Estimation
by: Xu, Guangkai, et al.
Published: (2026)

A Simple Image Segmentation Framework via In-Context Examples
by: Liu, Yang, et al.
Published: (2024)

CFDNet: A Generalizable Foggy Stereo Matching Network with Contrastive Feature Distillation
by: Liu, Zihua, et al.
Published: (2024)

Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation
by: Rong, Jintao, et al.
Published: (2025)

SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting
by: Gao, Zihui, et al.
Published: (2025)

Unified Open-World Segmentation with Multi-Modal Prompts
by: Liu, Yang, et al.
Published: (2025)

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
by: Chen, Zhekai, et al.
Published: (2024)

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
by: Chu, Xiangxiang, et al.
Published: (2023)

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?
by: Li, Liyang, et al.
Published: (2026)

Meta-FC: Meta-Learning with Feature Consistency for Robust and Generalizable Watermarking
by: Li, Yuheng, et al.
Published: (2026)

Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization
by: Li, Hanxi, et al.
Published: (2024)

Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
by: Hu, Mu, et al.
Published: (2024)

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
by: Chen, Cong, et al.
Published: (2025)

On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?
by: Imam, Raza, et al.
Published: (2025)

Object-aware Inversion and Reassembly for Image Editing
by: Yang, Zhen, et al.
Published: (2023)