:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Zhipei, Zhang, Xuanyu, Huang, Qing, Zhou, Xing, Zhang, Jian
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.15173
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization
by: Huang, Qing, et al.
Published: (2025)

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
by: Xu, Zhipei, et al.
Published: (2024)

GenShield: Unified Detection and Artifact Correction for AI-Generated Images
by: Xu, Zhipei, et al.
Published: (2026)

ReAlign: Generalizable Image Forgery Detection via Reasoning-Aligned Representation
by: Huang, Qing, et al.
Published: (2026)

Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
by: Fei, Yuanchen, et al.
Published: (2026)

HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
by: Cai, Yuxuan, et al.
Published: (2025)

GASP: Gaussian Avatars with Synthetic Priors
by: Saunders, Jack, et al.
Published: (2024)

Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute
by: Gong, Chaoqun, et al.
Published: (2024)

Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars
by: Zhang, Youliang, et al.
Published: (2026)

StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars
by: Sun, Zhiyao, et al.
Published: (2025)

HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks
by: Zhou, Ting, et al.
Published: (2024)

PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos
by: Zhou, Zhiyu, et al.
Published: (2026)

EVA: Efficient Reinforcement Learning for End-to-End Video Agent
by: Zhang, Yaolun, et al.
Published: (2026)

Human-Centric Video Anomaly Detection Through Spatio-Temporal Pose Tokenization and Transformer
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)

Reasoning-Enhanced Object-Centric Learning for Videos
by: Li, Jian, et al.
Published: (2024)

Protect-Your-IP: Scalable Source-Tracing and Attribution against Personalized Generation
by: Li, Runyi, et al.
Published: (2024)

An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)

MVR: Multi-view Video Reward Shaping for Reinforcement Learning
by: Luo, Lirui, et al.
Published: (2026)

V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection
by: Zhang, Xuanyu, et al.
Published: (2024)

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization
by: Zhu, Xuanyu, et al.
Published: (2026)

Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning
by: Sun, Jingbo, et al.
Published: (2026)

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
by: Wang, Haonan, et al.
Published: (2025)

Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning
by: Du, Dazhao, et al.
Published: (2026)

UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning
by: Le, Huy, et al.
Published: (2025)

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
by: Wang, Yuchi, et al.
Published: (2024)

VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis
by: Symeonidis-Herzig, Alexandre, et al.
Published: (2025)

Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
by: Huang, Weikai, et al.
Published: (2025)

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
by: Lee, Daeun, et al.
Published: (2026)

OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation
by: Gan, Qijun, et al.
Published: (2025)

Synthetic Human Action Video Data Generation with Pose Transfer
by: Knapp, Vaclav, et al.
Published: (2025)

GS-Hider: Hiding Messages into 3D Gaussian Splatting
by: Zhang, Xuanyu, et al.
Published: (2024)

VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
by: Ding, Yang, et al.
Published: (2025)

OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
by: Song, Yeon-Ji, et al.
Published: (2024)

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events
by: Liu, Xiaolin, et al.
Published: (2026)

Poivre: Self-Refining Visual Pointing with Reinforcement Learning
by: Yang, Wenjie, et al.
Published: (2025)

Reinforcing Structured Chain-of-Thought for Video Understanding
by: Wang, Peiyao, et al.
Published: (2026)

Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text
by: Jia, Yuyu, et al.
Published: (2024)

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning
by: Yuan, Zhecheng, et al.
Published: (2024)

Hulk: A Universal Knowledge Translator for Human-Centric Tasks
by: Wang, Yizhou, et al.
Published: (2023)

PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation
by: Xu, Ming, et al.
Published: (2025)