:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cheng, Yi, Xu, Ziwei, Lin, Dongyun, Cheng, Harry, Wong, Yongkang, Sun, Ying, Lim, Joo Hwee, Kankanhalli, Mohan
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2405.12538
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
by: Chai, Zenghao, et al.
Published: (2024)

Learning to Predict Gradients for Semi-Supervised Continual Learning
by: Luo, Yan, et al.
Published: (2022)

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
by: Li, Wei, et al.
Published: (2024)

ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens
by: Guo, Yangyang, et al.
Published: (2023)

Object-Centric Framework for Video Moment Retrieval
by: Li, Zongyao, et al.
Published: (2025)

MimiCAT: Mimic with Correspondence-Aware Cascade-Transformer for Category-Free 3D Pose Transfer
by: Chai, Zenghao, et al.
Published: (2025)

FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
by: Wang, Tianyi, et al.
Published: (2025)

Word-Anchored Temporal Forgery Localization
by: Wang, Tianyi, et al.
Published: (2026)

Controllable Hand Grasp Generation for HOI and Efficient Evaluation Methods
by: Ishant, et al.
Published: (2025)

Unveiling the Tapestry: the Interplay of Generalization and Forgetting in Continual Learning
by: Shi, Zenglin, et al.
Published: (2022)

Towards Generalizable Deepfake Detection via Real Distribution Bias Correction
by: Liu, Ming-Hui, et al.
Published: (2026)

Diffusion Facial Forgery Detection
by: Cheng, Harry, et al.
Published: (2024)

Finetuning Text-to-Image Diffusion Models for Fairness
by: Shen, Xudong, et al.
Published: (2023)

Fair Deepfake Detectors Can Generalize
by: Cheng, Harry, et al.
Published: (2025)

Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM
by: Guo, Yangyang, et al.
Published: (2024)

MCM: Multi-condition Motion Synthesis Framework
by: Ling, Zeyu, et al.
Published: (2024)

Diffusion Time-step Curriculum for One Image to 3D Generation
by: Yi, Xuanyu, et al.
Published: (2024)

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
by: Guo, Yangyang, et al.
Published: (2024)

Detecting Deepfakes via Hamiltonian Dynamics
by: Cheng, Harry, et al.
Published: (2026)

PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition
by: Lin, Dongyun, et al.
Published: (2024)

Aggregating Diverse Cue Experts for AI-Generated Image Detection
by: Tan, Lei, et al.
Published: (2026)

Joint Vision-Language Social Bias Removal for CLIP
by: Zhang, Haoyu, et al.
Published: (2024)

Identifying Hard Noise in Long-Tailed Sample Distribution
by: Yi, Xuanyu, et al.
Published: (2022)

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
by: Choong, Wey Yeh, et al.
Published: (2024)

Visual Prompting for One-shot Controllable Video Editing without Inversion
by: Zhang, Zhengbo, et al.
Published: (2025)

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
by: Yi, Xuanyu, et al.
Published: (2024)

HoloGest: Decoupled Diffusion and Motion Priors for Generating Holisticly Expressive Co-speech Gestures
by: Cheng, Yongkang, et al.
Published: (2025)

Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
by: Li, Zhenyang, et al.
Published: (2024)

Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
by: Fan, Hehe, et al.
Published: (2025)

Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models
by: Zhan, Yu-Wei, et al.
Published: (2023)

IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning
by: Qiu, Tianheng, et al.
Published: (2025)

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment
by: Hou, Yongkang, et al.
Published: (2025)

Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated
by: Yang, Muli, et al.
Published: (2026)

The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
by: Guo, Yangyang, et al.
Published: (2024)

MCM: Multi-condition Motion Synthesis Framework for Multi-scenario
by: Ling, Zeyu, et al.
Published: (2023)

Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation
by: Wang, Jiaxi, et al.
Published: (2023)

FreeInit: Bridging Initialization Gap in Video Diffusion Models
by: Wu, Tianxing, et al.
Published: (2023)

UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
by: Guo, Yangyang, et al.
Published: (2023)

Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling
by: Zhang, Min, et al.
Published: (2024)

MITA: Bridging the Gap between Model and Data for Test-time Adaptation
by: Yuan, Yige, et al.
Published: (2024)