:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Miao, Xingyu, Dong, Junting, Zhao, Qin, Yang, Yuhang, Chen, Junhao, Long, Yang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01661
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence
by: Li, Zhiyuan, et al.
Published: (2026)

TrajVG: 3D Trajectory-Coupled Visual Geometry Learning
by: Miao, Xingyu, et al.
Published: (2026)

SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
by: Yang, Yuhang, et al.
Published: (2025)

DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
by: Chen, Junhao, et al.
Published: (2025)

ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
by: Xue, Wangyu, et al.
Published: (2024)

From Frames to Events: Rethinking Evaluation in Human-Centric Video Anomaly Detection
by: Rashvand, Narges, et al.
Published: (2026)

STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution
by: Chen, Junyang, et al.
Published: (2025)

Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization
by: Zhao, Rongzhen, et al.
Published: (2026)

From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper
by: Li, Ling, et al.
Published: (2026)

Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation
by: Kizil, Muhammed Burak, et al.
Published: (2026)

Rethinking Score Distilling Sampling for 3D Editing and Generation
by: Miao, Xingyu, et al.
Published: (2025)

Motion Keyframe Interpolation for Any Human Skeleton via Temporally Consistent Point Cloud Sampling and Reconstruction
by: Mo, Clinton, et al.
Published: (2024)

Beyond Static Frames: Temporal Aggregate-and-Restore Vision Transformer for Human Pose Estimation
by: Fang, Hongwei, et al.
Published: (2026)

Spatial-Temporal-Spectral Unified Modeling for Remote Sensing Dense Prediction
by: Zhao, Sijie, et al.
Published: (2025)

V-CORE: Temporally Consistent Video Understanding for Video-LLM
by: Kang, Zhengjian, et al.
Published: (2026)

BiDense: Binarization for Dense Prediction
by: Yin, Rui, et al.
Published: (2024)

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
by: Ma, Yuhang, et al.
Published: (2024)

TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency
by: Shao, Minye, et al.
Published: (2025)

Decoding Visual Neural Representations by Multimodal with Dynamic Balancing
by: sun, Kaili, et al.
Published: (2025)

Cycle Consistency in Video Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2026)

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
by: Yang, Yuhang, et al.
Published: (2024)

HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding
by: Li, Keliang, et al.
Published: (2024)

MASRA: MLLM-Assisted Semantic-Relational Consistent Alignment for Video Temporal Grounding
by: Ran, Ran, et al.
Published: (2026)

DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
by: Lin, Ente, et al.
Published: (2024)

Beyond Boundary Frames: Context-Centric Video Interpolation with Audio-Visual Semantics
by: Deng, Yuchen, et al.
Published: (2025)

Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation
by: Adiya, Tserendorj, et al.
Published: (2023)

Towards Customized Knowledge Distillation for Chip-Level Dense Image Predictions
by: Zhang, Dong, et al.
Published: (2024)

Unified Dense Prediction of Video Diffusion
by: Yang, Lehan, et al.
Published: (2025)

IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction
by: Zhu, Jiangtong, et al.
Published: (2025)

Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings
by: Qin, Feiwei, et al.
Published: (2025)

From Spots to Pixels: Dense Spatial Gene Expression Prediction from Histology Images
by: Zhang, Ruikun, et al.
Published: (2025)

Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence
by: Yang, Shuai, et al.
Published: (2025)

SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
by: Zhao, Weiguang, et al.
Published: (2026)

ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance
by: Yang, Haijie, et al.
Published: (2024)

VEOcc: Voxel-Centric Online Semantic Occupancy Prediction For Embodied Scene Understanding
by: Wang, Ruoyu, et al.
Published: (2026)

Deep Learning in Concealed Dense Prediction
by: Zhao, Pancheng, et al.
Published: (2025)

Vision Transformers: From Semantic Segmentation to Dense Prediction
by: Zhang, Li, et al.
Published: (2022)

Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions
by: Shen, Yijun, et al.
Published: (2025)

Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases
by: Meo, Cristian, et al.
Published: (2024)

Tele-Catch: Adaptive Teleoperation for Dexterous Dynamic 3D Object Catching
by: Zhao, Weiguang, et al.
Published: (2026)