Saved in:
| Main Authors: | Xu, Zhipei, Zhang, Xuanyu, Huang, Qing, Zhou, Xing, Zhang, Jian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.15173 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization
by: Huang, Qing, et al.
Published: (2025)
by: Huang, Qing, et al.
Published: (2025)
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
by: Xu, Zhipei, et al.
Published: (2024)
by: Xu, Zhipei, et al.
Published: (2024)
GenShield: Unified Detection and Artifact Correction for AI-Generated Images
by: Xu, Zhipei, et al.
Published: (2026)
by: Xu, Zhipei, et al.
Published: (2026)
ReAlign: Generalizable Image Forgery Detection via Reasoning-Aligned Representation
by: Huang, Qing, et al.
Published: (2026)
by: Huang, Qing, et al.
Published: (2026)
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
by: Fei, Yuanchen, et al.
Published: (2026)
by: Fei, Yuanchen, et al.
Published: (2026)
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
by: Cai, Yuxuan, et al.
Published: (2025)
by: Cai, Yuxuan, et al.
Published: (2025)
GASP: Gaussian Avatars with Synthetic Priors
by: Saunders, Jack, et al.
Published: (2024)
by: Saunders, Jack, et al.
Published: (2024)
Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute
by: Gong, Chaoqun, et al.
Published: (2024)
by: Gong, Chaoqun, et al.
Published: (2024)
Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars
by: Zhang, Youliang, et al.
Published: (2026)
by: Zhang, Youliang, et al.
Published: (2026)
StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars
by: Sun, Zhiyao, et al.
Published: (2025)
by: Sun, Zhiyao, et al.
Published: (2025)
HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks
by: Zhou, Ting, et al.
Published: (2024)
by: Zhou, Ting, et al.
Published: (2024)
PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos
by: Zhou, Zhiyu, et al.
Published: (2026)
by: Zhou, Zhiyu, et al.
Published: (2026)
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
by: Zhang, Yaolun, et al.
Published: (2026)
by: Zhang, Yaolun, et al.
Published: (2026)
Human-Centric Video Anomaly Detection Through Spatio-Temporal Pose Tokenization and Transformer
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)
Reasoning-Enhanced Object-Centric Learning for Videos
by: Li, Jian, et al.
Published: (2024)
by: Li, Jian, et al.
Published: (2024)
Protect-Your-IP: Scalable Source-Tracing and Attribution against Personalized Generation
by: Li, Runyi, et al.
Published: (2024)
by: Li, Runyi, et al.
Published: (2024)
An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)
MVR: Multi-view Video Reward Shaping for Reinforcement Learning
by: Luo, Lirui, et al.
Published: (2026)
by: Luo, Lirui, et al.
Published: (2026)
V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection
by: Zhang, Xuanyu, et al.
Published: (2024)
by: Zhang, Xuanyu, et al.
Published: (2024)
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization
by: Zhu, Xuanyu, et al.
Published: (2026)
by: Zhu, Xuanyu, et al.
Published: (2026)
Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning
by: Sun, Jingbo, et al.
Published: (2026)
by: Sun, Jingbo, et al.
Published: (2026)
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning
by: Du, Dazhao, et al.
Published: (2026)
by: Du, Dazhao, et al.
Published: (2026)
UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning
by: Le, Huy, et al.
Published: (2025)
by: Le, Huy, et al.
Published: (2025)
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
by: Wang, Yuchi, et al.
Published: (2024)
by: Wang, Yuchi, et al.
Published: (2024)
VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis
by: Symeonidis-Herzig, Alexandre, et al.
Published: (2025)
by: Symeonidis-Herzig, Alexandre, et al.
Published: (2025)
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
by: Huang, Weikai, et al.
Published: (2025)
by: Huang, Weikai, et al.
Published: (2025)
VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
by: Lee, Daeun, et al.
Published: (2026)
by: Lee, Daeun, et al.
Published: (2026)
OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation
by: Gan, Qijun, et al.
Published: (2025)
by: Gan, Qijun, et al.
Published: (2025)
Synthetic Human Action Video Data Generation with Pose Transfer
by: Knapp, Vaclav, et al.
Published: (2025)
by: Knapp, Vaclav, et al.
Published: (2025)
GS-Hider: Hiding Messages into 3D Gaussian Splatting
by: Zhang, Xuanyu, et al.
Published: (2024)
by: Zhang, Xuanyu, et al.
Published: (2024)
VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
by: Ding, Yang, et al.
Published: (2025)
by: Ding, Yang, et al.
Published: (2025)
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
by: Song, Yeon-Ji, et al.
Published: (2024)
by: Song, Yeon-Ji, et al.
Published: (2024)
Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events
by: Liu, Xiaolin, et al.
Published: (2026)
by: Liu, Xiaolin, et al.
Published: (2026)
Poivre: Self-Refining Visual Pointing with Reinforcement Learning
by: Yang, Wenjie, et al.
Published: (2025)
by: Yang, Wenjie, et al.
Published: (2025)
Reinforcing Structured Chain-of-Thought for Video Understanding
by: Wang, Peiyao, et al.
Published: (2026)
by: Wang, Peiyao, et al.
Published: (2026)
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text
by: Jia, Yuyu, et al.
Published: (2024)
by: Jia, Yuyu, et al.
Published: (2024)
Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning
by: Yuan, Zhecheng, et al.
Published: (2024)
by: Yuan, Zhecheng, et al.
Published: (2024)
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
by: Wang, Yizhou, et al.
Published: (2023)
by: Wang, Yizhou, et al.
Published: (2023)
PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation
by: Xu, Ming, et al.
Published: (2025)
by: Xu, Ming, et al.
Published: (2025)
Similar Items
-
UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization
by: Huang, Qing, et al.
Published: (2025) -
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
by: Xu, Zhipei, et al.
Published: (2024) -
GenShield: Unified Detection and Artifact Correction for AI-Generated Images
by: Xu, Zhipei, et al.
Published: (2026) -
ReAlign: Generalizable Image Forgery Detection via Reasoning-Aligned Representation
by: Huang, Qing, et al.
Published: (2026) -
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
by: Fei, Yuanchen, et al.
Published: (2026)