Saved in:
| Main Authors: | Jiang, Hanwen, Jiang, Zhenyu, Grauman, Kristen, Zhu, Yuke |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2212.04492 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video
by: Adebi, Daniel, et al.
Published: (2025)
by: Adebi, Daniel, et al.
Published: (2025)
Learning Object State Changes in Videos: An Open-World Perspective
by: Xue, Zihui, et al.
Published: (2023)
by: Xue, Zihui, et al.
Published: (2023)
HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
by: An, Joungbin, et al.
Published: (2025)
by: An, Joungbin, et al.
Published: (2025)
HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness
by: Xue, Zihui, et al.
Published: (2024)
by: Xue, Zihui, et al.
Published: (2024)
Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos
by: Majumder, Sagnik, et al.
Published: (2024)
by: Majumder, Sagnik, et al.
Published: (2024)
Generic Objects as Pose Probes for Few-shot View Synthesis
by: Gao, Zhirui, et al.
Published: (2024)
by: Gao, Zhirui, et al.
Published: (2024)
Seeing without Pixels: Perception from Camera Trajectories
by: Xue, Zihui, et al.
Published: (2025)
by: Xue, Zihui, et al.
Published: (2025)
ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos
by: Somayazulu, Arjun, et al.
Published: (2026)
by: Somayazulu, Arjun, et al.
Published: (2026)
Learning Skill-Attributes for Transferable Assessment in Video
by: Ashutosh, Kumar, et al.
Published: (2025)
by: Ashutosh, Kumar, et al.
Published: (2025)
ViewBridge: Curriculum Knowledge Distillation for Activity View-Invariance Under Extreme Viewpoint Changes
by: Somayazulu, Arjun, et al.
Published: (2025)
by: Somayazulu, Arjun, et al.
Published: (2025)
UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding
by: An, Joungbin, et al.
Published: (2026)
by: An, Joungbin, et al.
Published: (2026)
MZEN: Multi-Zoom Enhanced NeRF for 3-D Reconstruction with Unknown Camera Poses
by: Park, Jong-Ik, et al.
Published: (2025)
by: Park, Jong-Ik, et al.
Published: (2025)
SPOC: Spatially-Progressing Object State Change Segmentation in Video
by: Mandikal, Priyanka, et al.
Published: (2025)
by: Mandikal, Priyanka, et al.
Published: (2025)
Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images
by: Zhang, Chuanrui, et al.
Published: (2024)
by: Zhang, Chuanrui, et al.
Published: (2024)
FIction: 4D Future Interaction Prediction from Video
by: Ashutosh, Kumar, et al.
Published: (2024)
by: Ashutosh, Kumar, et al.
Published: (2024)
Seeing the Arrow of Time in Large Multimodal Models
by: Xue, Zihui, et al.
Published: (2025)
by: Xue, Zihui, et al.
Published: (2025)
Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models
by: Baid, Ami, et al.
Published: (2026)
by: Baid, Ami, et al.
Published: (2026)
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
by: Majumder, Sagnik, et al.
Published: (2024)
by: Majumder, Sagnik, et al.
Published: (2024)
Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos
by: Sun, Shuo, et al.
Published: (2026)
by: Sun, Shuo, et al.
Published: (2026)
A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose
by: Jiang, Kaiwen, et al.
Published: (2024)
by: Jiang, Kaiwen, et al.
Published: (2024)
Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View
by: Zhang, Jinyu, et al.
Published: (2025)
by: Zhang, Jinyu, et al.
Published: (2025)
SkillSight: Efficient First-Person Skill Assessment with Gaze
by: Wu, Chi Hsuan, et al.
Published: (2025)
by: Wu, Chi Hsuan, et al.
Published: (2025)
EgoExo-WM: Unlocking Exo Video for Ego World Models
by: Tran, Danny, et al.
Published: (2026)
by: Tran, Danny, et al.
Published: (2026)
Progress-Aware Video Frame Captioning
by: Xue, Zihui, et al.
Published: (2024)
by: Xue, Zihui, et al.
Published: (2024)
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
by: Wu, Chi Hsuan, et al.
Published: (2025)
by: Wu, Chi Hsuan, et al.
Published: (2025)
SportSkills: Physical Skill Learning from Sports Instructional Videos
by: Ashutosh, Kumar, et al.
Published: (2026)
by: Ashutosh, Kumar, et al.
Published: (2026)
CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
by: Lin, Xiao, et al.
Published: (2025)
by: Lin, Xiao, et al.
Published: (2025)
Learning a Category-level Object Pose Estimator without Pose Annotations
by: Tian, Fengrui, et al.
Published: (2024)
by: Tian, Fengrui, et al.
Published: (2024)
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
by: Li, Weihang, et al.
Published: (2025)
by: Li, Weihang, et al.
Published: (2025)
Learning Unknowns from Unknowns: Diversified Negative Prototypes Generator for Few-Shot Open-Set Recognition
by: Zhang, Zhenyu, et al.
Published: (2024)
by: Zhang, Zhenyu, et al.
Published: (2024)
Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera
by: Shi, Haixin, et al.
Published: (2024)
by: Shi, Haixin, et al.
Published: (2024)
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
by: Luo, Mi, et al.
Published: (2024)
by: Luo, Mi, et al.
Published: (2024)
Detours for Navigating Instructional Videos
by: Ashutosh, Kumar, et al.
Published: (2024)
by: Ashutosh, Kumar, et al.
Published: (2024)
Indoor 3D Reconstruction with an Unknown Camera-Projector Pair
by: Qi, Zhaoshuai, et al.
Published: (2024)
by: Qi, Zhaoshuai, et al.
Published: (2024)
Real3D: Scaling Up Large Reconstruction Models with Real-World Images
by: Jiang, Hanwen, et al.
Published: (2024)
by: Jiang, Hanwen, et al.
Published: (2024)
Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates
by: Zhu, Shengjie, et al.
Published: (2026)
by: Zhu, Shengjie, et al.
Published: (2026)
CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
by: Lin, Xiao, et al.
Published: (2024)
by: Lin, Xiao, et al.
Published: (2024)
Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress
by: Mandikal, Priyanka, et al.
Published: (2025)
by: Mandikal, Priyanka, et al.
Published: (2025)
TSM-Pose: Topology-Aware Learning with Semantic Mamba for Category-Level Object Pose Estimation
by: Liu, Jinshuo, et al.
Published: (2026)
by: Liu, Jinshuo, et al.
Published: (2026)
Exploring Category-level Articulated Object Pose Tracking on SE(3) Manifolds
by: Meng, Xianhui, et al.
Published: (2025)
by: Meng, Xianhui, et al.
Published: (2025)
Similar Items
-
Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video
by: Adebi, Daniel, et al.
Published: (2025) -
Learning Object State Changes in Videos: An Open-World Perspective
by: Xue, Zihui, et al.
Published: (2023) -
HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
by: An, Joungbin, et al.
Published: (2025) -
HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness
by: Xue, Zihui, et al.
Published: (2024) -
Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos
by: Majumder, Sagnik, et al.
Published: (2024)