Saved in:
| Main Authors: | Yang, Honghui, Huang, Di, Yin, Wei, Shen, Chunhua, Liu, Haifeng, He, Xiaofei, Lin, Binbin, Ouyang, Wanli, He, Tong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.10815 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
by: Yang, Honghui, et al.
Published: (2023)
by: Yang, Honghui, et al.
Published: (2023)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
by: Zhu, Haoyi, et al.
Published: (2023)
by: Zhu, Haoyi, et al.
Published: (2023)
NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection
by: Huang, Chenxi, et al.
Published: (2024)
by: Huang, Chenxi, et al.
Published: (2024)
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
by: Chen, Junyi, et al.
Published: (2024)
by: Chen, Junyi, et al.
Published: (2024)
DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
by: Ye, Weicai, et al.
Published: (2024)
by: Ye, Weicai, et al.
Published: (2024)
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
by: Wu, Xiaopei, et al.
Published: (2024)
by: Wu, Xiaopei, et al.
Published: (2024)
Geo-Align: Video Generation Alignment via Metric Geometry Reward
by: Li, Zizun, et al.
Published: (2026)
by: Li, Zizun, et al.
Published: (2026)
NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction
by: Wang, Yifan, et al.
Published: (2024)
by: Wang, Yifan, et al.
Published: (2024)
Semi-supervised 3D Object Detection with PatchTeacher and PillarMix
by: Wu, Xiaopei, et al.
Published: (2024)
by: Wu, Xiaopei, et al.
Published: (2024)
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
by: Zhang, Sha, et al.
Published: (2024)
by: Zhang, Sha, et al.
Published: (2024)
DA$^{2}$: Depth Anything in Any Direction
by: Li, Haodong, et al.
Published: (2025)
by: Li, Haodong, et al.
Published: (2025)
GVGEN: Text-to-3D Generation with Volumetric Representation
by: He, Xianglong, et al.
Published: (2024)
by: He, Xianglong, et al.
Published: (2024)
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
by: Yang, Yang, et al.
Published: (2025)
by: Yang, Yang, et al.
Published: (2025)
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
by: Wang, Yuan, et al.
Published: (2024)
by: Wang, Yuan, et al.
Published: (2024)
Transparent Object Depth Completion
by: Zhou, Yifan, et al.
Published: (2024)
by: Zhou, Yifan, et al.
Published: (2024)
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
by: Zhu, Haoyi, et al.
Published: (2024)
by: Zhu, Haoyi, et al.
Published: (2024)
PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
by: Wang, ZiDong, et al.
Published: (2024)
by: Wang, ZiDong, et al.
Published: (2024)
A CLIP-Powered Framework for Robust and Generalizable Data Selection
by: Yang, Suorong, et al.
Published: (2024)
by: Yang, Suorong, et al.
Published: (2024)
Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision
by: Li, Minglei, et al.
Published: (2024)
by: Li, Minglei, et al.
Published: (2024)
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
by: Yang, Yanting, et al.
Published: (2024)
by: Yang, Yanting, et al.
Published: (2024)
EMR-Merging: Tuning-Free High-Performance Model Merging
by: Huang, Chenyu, et al.
Published: (2024)
by: Huang, Chenyu, et al.
Published: (2024)
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator
by: He, Xiankang, et al.
Published: (2025)
by: He, Xiankang, et al.
Published: (2025)
GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction
by: Chen, Junyi, et al.
Published: (2024)
by: Chen, Junyi, et al.
Published: (2024)
Depth Anything at Any Condition
by: Sun, Boyuan, et al.
Published: (2025)
by: Sun, Boyuan, et al.
Published: (2025)
Depth Anything with Any Prior
by: Wang, Zehan, et al.
Published: (2025)
by: Wang, Zehan, et al.
Published: (2025)
MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs
by: He, Xianglong, et al.
Published: (2025)
by: He, Xianglong, et al.
Published: (2025)
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation
by: Lin, Xin, et al.
Published: (2025)
by: Lin, Xin, et al.
Published: (2025)
Gaussian Difference: Find Any Change Instance in 3D Scenes
by: Jiang, Binbin, et al.
Published: (2025)
by: Jiang, Binbin, et al.
Published: (2025)
OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection
by: Huang, Chenxi, et al.
Published: (2022)
by: Huang, Chenxi, et al.
Published: (2022)
SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
by: Zhao, Chengshu, et al.
Published: (2025)
by: Zhao, Chengshu, et al.
Published: (2025)
Dereflection Any Image with Diffusion Priors and Diversified Data
by: Hu, Jichen, et al.
Published: (2025)
by: Hu, Jichen, et al.
Published: (2025)
AnyDepth: Depth Estimation Made Easy
by: Ren, Zeyu, et al.
Published: (2026)
by: Ren, Zeyu, et al.
Published: (2026)
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
by: Ye, Weicai, et al.
Published: (2024)
by: Ye, Weicai, et al.
Published: (2024)
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
by: Guo, Yuliang, et al.
Published: (2025)
by: Guo, Yuliang, et al.
Published: (2025)
BRIDGE -- Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
by: Liu, Dingning, et al.
Published: (2025)
by: Liu, Dingning, et al.
Published: (2025)
VEnhancer: Generative Space-Time Enhancement for Video Generation
by: He, Jingwen, et al.
Published: (2024)
by: He, Jingwen, et al.
Published: (2024)
KiToke: Kernel-based Interval-aware Token Compression for Video Large Language Models
by: Huang, Haifeng, et al.
Published: (2026)
by: Huang, Haifeng, et al.
Published: (2026)
Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video
by: Gao, Zihui, et al.
Published: (2026)
by: Gao, Zihui, et al.
Published: (2026)
Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation
by: Wu, Xiaoyang, et al.
Published: (2024)
by: Wu, Xiaoyang, et al.
Published: (2024)
Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
by: Ji, Lichuan, et al.
Published: (2024)
by: Ji, Lichuan, et al.
Published: (2024)
Similar Items
-
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
by: Yang, Honghui, et al.
Published: (2023) -
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
by: Zhu, Haoyi, et al.
Published: (2023) -
NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection
by: Huang, Chenxi, et al.
Published: (2024) -
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
by: Chen, Junyi, et al.
Published: (2024) -
DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
by: Ye, Weicai, et al.
Published: (2024)