Saved in:
| Main Authors: | Miao, Xingyu, Dong, Junting, Zhao, Qin, Yang, Yuhang, Chen, Junhao, Long, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01661 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence
by: Li, Zhiyuan, et al.
Published: (2026)
by: Li, Zhiyuan, et al.
Published: (2026)
TrajVG: 3D Trajectory-Coupled Visual Geometry Learning
by: Miao, Xingyu, et al.
Published: (2026)
by: Miao, Xingyu, et al.
Published: (2026)
SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
by: Yang, Yuhang, et al.
Published: (2025)
by: Yang, Yuhang, et al.
Published: (2025)
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
by: Chen, Junhao, et al.
Published: (2025)
by: Chen, Junhao, et al.
Published: (2025)
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
by: Xue, Wangyu, et al.
Published: (2024)
by: Xue, Wangyu, et al.
Published: (2024)
From Frames to Events: Rethinking Evaluation in Human-Centric Video Anomaly Detection
by: Rashvand, Narges, et al.
Published: (2026)
by: Rashvand, Narges, et al.
Published: (2026)
STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution
by: Chen, Junyang, et al.
Published: (2025)
by: Chen, Junyang, et al.
Published: (2025)
Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization
by: Zhao, Rongzhen, et al.
Published: (2026)
by: Zhao, Rongzhen, et al.
Published: (2026)
From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper
by: Li, Ling, et al.
Published: (2026)
by: Li, Ling, et al.
Published: (2026)
Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation
by: Kizil, Muhammed Burak, et al.
Published: (2026)
by: Kizil, Muhammed Burak, et al.
Published: (2026)
Rethinking Score Distilling Sampling for 3D Editing and Generation
by: Miao, Xingyu, et al.
Published: (2025)
by: Miao, Xingyu, et al.
Published: (2025)
Motion Keyframe Interpolation for Any Human Skeleton via Temporally Consistent Point Cloud Sampling and Reconstruction
by: Mo, Clinton, et al.
Published: (2024)
by: Mo, Clinton, et al.
Published: (2024)
Beyond Static Frames: Temporal Aggregate-and-Restore Vision Transformer for Human Pose Estimation
by: Fang, Hongwei, et al.
Published: (2026)
by: Fang, Hongwei, et al.
Published: (2026)
Spatial-Temporal-Spectral Unified Modeling for Remote Sensing Dense Prediction
by: Zhao, Sijie, et al.
Published: (2025)
by: Zhao, Sijie, et al.
Published: (2025)
V-CORE: Temporally Consistent Video Understanding for Video-LLM
by: Kang, Zhengjian, et al.
Published: (2026)
by: Kang, Zhengjian, et al.
Published: (2026)
BiDense: Binarization for Dense Prediction
by: Yin, Rui, et al.
Published: (2024)
by: Yin, Rui, et al.
Published: (2024)
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
by: Ma, Yuhang, et al.
Published: (2024)
by: Ma, Yuhang, et al.
Published: (2024)
TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency
by: Shao, Minye, et al.
Published: (2025)
by: Shao, Minye, et al.
Published: (2025)
Decoding Visual Neural Representations by Multimodal with Dynamic Balancing
by: sun, Kaili, et al.
Published: (2025)
by: sun, Kaili, et al.
Published: (2025)
Cycle Consistency in Video Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2026)
by: Zhao, Rongzhen, et al.
Published: (2026)
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
by: Yang, Yuhang, et al.
Published: (2024)
by: Yang, Yuhang, et al.
Published: (2024)
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding
by: Li, Keliang, et al.
Published: (2024)
by: Li, Keliang, et al.
Published: (2024)
MASRA: MLLM-Assisted Semantic-Relational Consistent Alignment for Video Temporal Grounding
by: Ran, Ran, et al.
Published: (2026)
by: Ran, Ran, et al.
Published: (2026)
DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
by: Lin, Ente, et al.
Published: (2024)
by: Lin, Ente, et al.
Published: (2024)
Beyond Boundary Frames: Context-Centric Video Interpolation with Audio-Visual Semantics
by: Deng, Yuchen, et al.
Published: (2025)
by: Deng, Yuchen, et al.
Published: (2025)
Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation
by: Adiya, Tserendorj, et al.
Published: (2023)
by: Adiya, Tserendorj, et al.
Published: (2023)
Towards Customized Knowledge Distillation for Chip-Level Dense Image Predictions
by: Zhang, Dong, et al.
Published: (2024)
by: Zhang, Dong, et al.
Published: (2024)
Unified Dense Prediction of Video Diffusion
by: Yang, Lehan, et al.
Published: (2025)
by: Yang, Lehan, et al.
Published: (2025)
IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction
by: Zhu, Jiangtong, et al.
Published: (2025)
by: Zhu, Jiangtong, et al.
Published: (2025)
Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings
by: Qin, Feiwei, et al.
Published: (2025)
by: Qin, Feiwei, et al.
Published: (2025)
From Spots to Pixels: Dense Spatial Gene Expression Prediction from Histology Images
by: Zhang, Ruikun, et al.
Published: (2025)
by: Zhang, Ruikun, et al.
Published: (2025)
Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence
by: Yang, Shuai, et al.
Published: (2025)
by: Yang, Shuai, et al.
Published: (2025)
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
by: Zhao, Weiguang, et al.
Published: (2026)
by: Zhao, Weiguang, et al.
Published: (2026)
ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance
by: Yang, Haijie, et al.
Published: (2024)
by: Yang, Haijie, et al.
Published: (2024)
VEOcc: Voxel-Centric Online Semantic Occupancy Prediction For Embodied Scene Understanding
by: Wang, Ruoyu, et al.
Published: (2026)
by: Wang, Ruoyu, et al.
Published: (2026)
Deep Learning in Concealed Dense Prediction
by: Zhao, Pancheng, et al.
Published: (2025)
by: Zhao, Pancheng, et al.
Published: (2025)
Vision Transformers: From Semantic Segmentation to Dense Prediction
by: Zhang, Li, et al.
Published: (2022)
by: Zhang, Li, et al.
Published: (2022)
Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions
by: Shen, Yijun, et al.
Published: (2025)
by: Shen, Yijun, et al.
Published: (2025)
Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases
by: Meo, Cristian, et al.
Published: (2024)
by: Meo, Cristian, et al.
Published: (2024)
Tele-Catch: Adaptive Teleoperation for Dexterous Dynamic 3D Object Catching
by: Zhao, Weiguang, et al.
Published: (2026)
by: Zhao, Weiguang, et al.
Published: (2026)
Similar Items
-
Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence
by: Li, Zhiyuan, et al.
Published: (2026) -
TrajVG: 3D Trajectory-Coupled Visual Geometry Learning
by: Miao, Xingyu, et al.
Published: (2026) -
SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
by: Yang, Yuhang, et al.
Published: (2025) -
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
by: Chen, Junhao, et al.
Published: (2025) -
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
by: Xue, Wangyu, et al.
Published: (2024)