Saved in:
| Main Authors: | Cheng, Yi, Xu, Ziwei, Lin, Dongyun, Cheng, Harry, Wong, Yongkang, Sun, Ying, Lim, Joo Hwee, Kankanhalli, Mohan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.12538 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
by: Chai, Zenghao, et al.
Published: (2024)
by: Chai, Zenghao, et al.
Published: (2024)
Learning to Predict Gradients for Semi-Supervised Continual Learning
by: Luo, Yan, et al.
Published: (2022)
by: Luo, Yan, et al.
Published: (2022)
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
by: Li, Wei, et al.
Published: (2024)
by: Li, Wei, et al.
Published: (2024)
ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens
by: Guo, Yangyang, et al.
Published: (2023)
by: Guo, Yangyang, et al.
Published: (2023)
Object-Centric Framework for Video Moment Retrieval
by: Li, Zongyao, et al.
Published: (2025)
by: Li, Zongyao, et al.
Published: (2025)
MimiCAT: Mimic with Correspondence-Aware Cascade-Transformer for Category-Free 3D Pose Transfer
by: Chai, Zenghao, et al.
Published: (2025)
by: Chai, Zenghao, et al.
Published: (2025)
FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
by: Wang, Tianyi, et al.
Published: (2025)
by: Wang, Tianyi, et al.
Published: (2025)
Word-Anchored Temporal Forgery Localization
by: Wang, Tianyi, et al.
Published: (2026)
by: Wang, Tianyi, et al.
Published: (2026)
Controllable Hand Grasp Generation for HOI and Efficient Evaluation Methods
by: Ishant, et al.
Published: (2025)
by: Ishant, et al.
Published: (2025)
Unveiling the Tapestry: the Interplay of Generalization and Forgetting in Continual Learning
by: Shi, Zenglin, et al.
Published: (2022)
by: Shi, Zenglin, et al.
Published: (2022)
Towards Generalizable Deepfake Detection via Real Distribution Bias Correction
by: Liu, Ming-Hui, et al.
Published: (2026)
by: Liu, Ming-Hui, et al.
Published: (2026)
Diffusion Facial Forgery Detection
by: Cheng, Harry, et al.
Published: (2024)
by: Cheng, Harry, et al.
Published: (2024)
Finetuning Text-to-Image Diffusion Models for Fairness
by: Shen, Xudong, et al.
Published: (2023)
by: Shen, Xudong, et al.
Published: (2023)
Fair Deepfake Detectors Can Generalize
by: Cheng, Harry, et al.
Published: (2025)
by: Cheng, Harry, et al.
Published: (2025)
Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM
by: Guo, Yangyang, et al.
Published: (2024)
by: Guo, Yangyang, et al.
Published: (2024)
MCM: Multi-condition Motion Synthesis Framework
by: Ling, Zeyu, et al.
Published: (2024)
by: Ling, Zeyu, et al.
Published: (2024)
Diffusion Time-step Curriculum for One Image to 3D Generation
by: Yi, Xuanyu, et al.
Published: (2024)
by: Yi, Xuanyu, et al.
Published: (2024)
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
by: Guo, Yangyang, et al.
Published: (2024)
by: Guo, Yangyang, et al.
Published: (2024)
Detecting Deepfakes via Hamiltonian Dynamics
by: Cheng, Harry, et al.
Published: (2026)
by: Cheng, Harry, et al.
Published: (2026)
PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition
by: Lin, Dongyun, et al.
Published: (2024)
by: Lin, Dongyun, et al.
Published: (2024)
Aggregating Diverse Cue Experts for AI-Generated Image Detection
by: Tan, Lei, et al.
Published: (2026)
by: Tan, Lei, et al.
Published: (2026)
Joint Vision-Language Social Bias Removal for CLIP
by: Zhang, Haoyu, et al.
Published: (2024)
by: Zhang, Haoyu, et al.
Published: (2024)
Identifying Hard Noise in Long-Tailed Sample Distribution
by: Yi, Xuanyu, et al.
Published: (2022)
by: Yi, Xuanyu, et al.
Published: (2022)
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
by: Choong, Wey Yeh, et al.
Published: (2024)
by: Choong, Wey Yeh, et al.
Published: (2024)
Visual Prompting for One-shot Controllable Video Editing without Inversion
by: Zhang, Zhengbo, et al.
Published: (2025)
by: Zhang, Zhengbo, et al.
Published: (2025)
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
by: Yi, Xuanyu, et al.
Published: (2024)
by: Yi, Xuanyu, et al.
Published: (2024)
HoloGest: Decoupled Diffusion and Motion Priors for Generating Holisticly Expressive Co-speech Gestures
by: Cheng, Yongkang, et al.
Published: (2025)
by: Cheng, Yongkang, et al.
Published: (2025)
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
by: Li, Zhenyang, et al.
Published: (2024)
by: Li, Zhenyang, et al.
Published: (2024)
Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
by: Fan, Hehe, et al.
Published: (2025)
by: Fan, Hehe, et al.
Published: (2025)
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models
by: Zhan, Yu-Wei, et al.
Published: (2023)
by: Zhan, Yu-Wei, et al.
Published: (2023)
IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning
by: Qiu, Tianheng, et al.
Published: (2025)
by: Qiu, Tianheng, et al.
Published: (2025)
Visual-Language Model Knowledge Distillation Method for Image Quality Assessment
by: Hou, Yongkang, et al.
Published: (2025)
by: Hou, Yongkang, et al.
Published: (2025)
Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated
by: Yang, Muli, et al.
Published: (2026)
by: Yang, Muli, et al.
Published: (2026)
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
by: Guo, Yangyang, et al.
Published: (2024)
by: Guo, Yangyang, et al.
Published: (2024)
MCM: Multi-condition Motion Synthesis Framework for Multi-scenario
by: Ling, Zeyu, et al.
Published: (2023)
by: Ling, Zeyu, et al.
Published: (2023)
Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation
by: Wang, Jiaxi, et al.
Published: (2023)
by: Wang, Jiaxi, et al.
Published: (2023)
FreeInit: Bridging Initialization Gap in Video Diffusion Models
by: Wu, Tianxing, et al.
Published: (2023)
by: Wu, Tianxing, et al.
Published: (2023)
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
by: Guo, Yangyang, et al.
Published: (2023)
by: Guo, Yangyang, et al.
Published: (2023)
Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling
by: Zhang, Min, et al.
Published: (2024)
by: Zhang, Min, et al.
Published: (2024)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation
by: Yuan, Yige, et al.
Published: (2024)
by: Yuan, Yige, et al.
Published: (2024)
Similar Items
-
STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
by: Chai, Zenghao, et al.
Published: (2024) -
Learning to Predict Gradients for Semi-Supervised Continual Learning
by: Luo, Yan, et al.
Published: (2022) -
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
by: Li, Wei, et al.
Published: (2024) -
ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens
by: Guo, Yangyang, et al.
Published: (2023) -
Object-Centric Framework for Video Moment Retrieval
by: Li, Zongyao, et al.
Published: (2025)