:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lin, Shubo, Zhang, Xuanyang, Cheng, Wei, Hu, Weiming, Yu, Gang, Gao, Jin
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2604.02817
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

From Generated Human Videos to Physically Plausible Robot Trajectories
von: Ni, James, et al.
Veröffentlicht: (2025)

Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility
von: Hao, Yutong, et al.
Veröffentlicht: (2025)

Chain of Event-Centric Causal Thought for Physically Plausible Video Generation
von: Wang, Zixuan, et al.
Veröffentlicht: (2026)

OrthoPhys: Physically Plausible Video Generation with Orthogonal-View Geometry Guidance
von: Wang, Cong, et al.
Veröffentlicht: (2026)

Tempered Self-Similarity Alignment for Physically Plausible Video Generation
von: Kim, Manjin, et al.
Veröffentlicht: (2026)

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
von: Jiang, Yanqin, et al.
Veröffentlicht: (2024)

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
von: Chen, Harold Haodong, et al.
Veröffentlicht: (2025)

SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking
von: Lin, Shubo, et al.
Veröffentlicht: (2024)

An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
von: Gao, Jin, et al.
Veröffentlicht: (2024)

Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics
von: Shen, Ying, et al.
Veröffentlicht: (2026)

Proprio: Latent Self-Scoring and Inference-Time Refinement for Physically Plausible Video Generation
von: Hassan, Mariam, et al.
Veröffentlicht: (2026)

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
von: Wang, Yi, et al.
Veröffentlicht: (2024)

SPR-128K: A New Benchmark for Spatial Plausibility Reasoning with Multimodal Large Language Models
von: Hu, Zhiyuan, et al.
Veröffentlicht: (2025)

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
von: Yang, Xindi, et al.
Veröffentlicht: (2025)

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
von: Cheng, Junhao, et al.
Veröffentlicht: (2025)

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
von: Wang, Junke, et al.
Veröffentlicht: (2024)

Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation
von: Tu, Shuyuan, et al.
Veröffentlicht: (2026)

MI-DETR: A Strong Baseline for Moving Infrared Small Target Detection with Bio-Inspired Motion Integration
von: Liu, Nian, et al.
Veröffentlicht: (2026)

Generative Neural Video Compression via Video Diffusion Prior
von: Mao, Qi, et al.
Veröffentlicht: (2025)

Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations
von: Dong, Xiaoxiang, et al.
Veröffentlicht: (2025)

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
von: Liu, Ropeway, et al.
Veröffentlicht: (2025)

An Experimental Study on Generating Plausible Textual Explanations for Video Summarization
von: Eleftheriadis, Thomas, et al.
Veröffentlicht: (2025)

PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
von: Wu, Shang, et al.
Veröffentlicht: (2026)

EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion
von: Wei, Jiangchuan, et al.
Veröffentlicht: (2025)

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
von: Hu, Teng, et al.
Veröffentlicht: (2025)

JoVA: Unified Multimodal Learning for Joint Video-Audio Generation
von: Huang, Xiaohu, et al.
Veröffentlicht: (2025)

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
von: Zhang, Xiangdong, et al.
Veröffentlicht: (2025)

Portrait Video Editing Empowered by Multimodal Generative Priors
von: Gao, Xuan, et al.
Veröffentlicht: (2024)

ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model
von: Han, Gaoge, et al.
Veröffentlicht: (2024)

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
von: Zhang, Boqiang, et al.
Veröffentlicht: (2025)

Scaling Zero-Shot Reference-to-Video Generation
von: Zhou, Zijian, et al.
Veröffentlicht: (2025)

Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
von: Lin, Wang, et al.
Veröffentlicht: (2025)

GEditBench v2: A Human-Aligned Benchmark for General Image Editing
von: Jiang, Zhangqi, et al.
Veröffentlicht: (2026)

LongCaptioning: Unlocking the Power of Long Video Caption Generation in Large Multimodal Models
von: Wei, Hongchen, et al.
Veröffentlicht: (2025)

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
von: Ren, Weiming, et al.
Veröffentlicht: (2024)

Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos
von: Zhao, Yubo, et al.
Veröffentlicht: (2026)

Video Self-Distillation for Single-Image Encoders: A Step Toward Physically Plausible Perception
von: Simon, Marcel, et al.
Veröffentlicht: (2025)

AI-Generated Video Detection via Spatio-Temporal Anomaly Learning
von: Bai, Jianfa, et al.
Veröffentlicht: (2024)

StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians
von: Zhuang, Cailin, et al.
Veröffentlicht: (2025)

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
von: Chefer, Hila, et al.
Veröffentlicht: (2025)