:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Honghui, Huang, Di, Yin, Wei, Shen, Chunhua, Liu, Haifeng, He, Xiaofei, Lin, Binbin, Ouyang, Wanli, He, Tong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.10815
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
by: Yang, Honghui, et al.
Published: (2023)

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
by: Zhu, Haoyi, et al.
Published: (2023)

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection
by: Huang, Chenxi, et al.
Published: (2024)

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
by: Chen, Junyi, et al.
Published: (2024)

DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
by: Ye, Weicai, et al.
Published: (2024)

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
by: Wu, Xiaopei, et al.
Published: (2024)

Geo-Align: Video Generation Alignment via Metric Geometry Reward
by: Li, Zizun, et al.
Published: (2026)

NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction
by: Wang, Yifan, et al.
Published: (2024)

Semi-supervised 3D Object Detection with PatchTeacher and PillarMix
by: Wu, Xiaopei, et al.
Published: (2024)

Agent3D-Zero: An Agent for Zero-shot 3D Understanding
by: Zhang, Sha, et al.
Published: (2024)

DA$^{2}$: Depth Anything in Any Direction
by: Li, Haodong, et al.
Published: (2025)

GVGEN: Text-to-3D Generation with Volumetric Representation
by: He, Xianglong, et al.
Published: (2024)

Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
by: Yang, Yang, et al.
Published: (2025)

Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
by: Wang, Yuan, et al.
Published: (2024)

Transparent Object Depth Completion
by: Zhou, Yifan, et al.
Published: (2024)

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
by: Zhu, Haoyi, et al.
Published: (2024)

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
by: Wang, ZiDong, et al.
Published: (2024)

A CLIP-Powered Framework for Robust and Generalizable Data Selection
by: Yang, Suorong, et al.
Published: (2024)

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision
by: Li, Minglei, et al.
Published: (2024)

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
by: Yang, Yanting, et al.
Published: (2024)

EMR-Merging: Tuning-Free High-Performance Model Merging
by: Huang, Chenyu, et al.
Published: (2024)

Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator
by: He, Xiankang, et al.
Published: (2025)

GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction
by: Chen, Junyi, et al.
Published: (2024)

Depth Anything at Any Condition
by: Sun, Boyuan, et al.
Published: (2025)

Depth Anything with Any Prior
by: Wang, Zehan, et al.
Published: (2025)

MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs
by: He, Xianglong, et al.
Published: (2025)

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation
by: Lin, Xin, et al.
Published: (2025)

Gaussian Difference: Find Any Change Instance in 3D Scenes
by: Jiang, Binbin, et al.
Published: (2025)

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection
by: Huang, Chenxi, et al.
Published: (2022)

SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
by: Zhao, Chengshu, et al.
Published: (2025)

Dereflection Any Image with Diffusion Priors and Diversified Data
by: Hu, Jichen, et al.
Published: (2025)

AnyDepth: Depth Estimation Made Easy
by: Ren, Zeyu, et al.
Published: (2026)

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
by: Ye, Weicai, et al.
Published: (2024)

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
by: Guo, Yuliang, et al.
Published: (2025)

BRIDGE -- Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
by: Liu, Dingning, et al.
Published: (2025)

VEnhancer: Generative Space-Time Enhancement for Video Generation
by: He, Jingwen, et al.
Published: (2024)

KiToke: Kernel-based Interval-aware Token Compression for Video Large Language Models
by: Huang, Haifeng, et al.
Published: (2026)

Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video
by: Gao, Zihui, et al.
Published: (2026)

Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation
by: Wu, Xiaoyang, et al.
Published: (2024)

Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
by: Ji, Lichuan, et al.
Published: (2024)