:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Xinquan, Lu, Wei, Luo, Xiangyang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.02479
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CIEC: Coupling Implicit and Explicit Cues for Multimodal Weakly Supervised Manipulation Localization
by: Yu, Xinquan, et al.
Published: (2026)

RaCMC: Residual-Aware Compensation Network with Multi-Granularity Constraints for Fake News Detection
by: Yu, Xinquan, et al.
Published: (2024)

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
by: He, Junwen, et al.
Published: (2024)

MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection
by: Zhang, Yaning, et al.
Published: (2024)

Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
by: Ma, Zehong, et al.
Published: (2025)

Diffusion-based Adversarial Identity Manipulation for Facial Privacy Protection
by: Wang, Liqin, et al.
Published: (2025)

Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction
by: Yan, Zhongnuo, et al.
Published: (2024)

MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
by: Wu, Sijing, et al.
Published: (2024)

MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models
by: Xu, Wenbo, et al.
Published: (2026)

Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating
by: Liu, Sheng-Lan, et al.
Published: (2023)

Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network
by: Yang, Xinquan, et al.
Published: (2024)

Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding
by: Wang, Jiazhen, et al.
Published: (2023)

ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification
by: Qu, Zuomin, et al.
Published: (2024)

Semantic Change Detection of Roads and Bridges: A Fine-grained Dataset and Multimodal Frequency-driven Detector
by: Shu, Qingling, et al.
Published: (2025)

Weakly Supervised Multimodal Temporal Forgery Localization via Multitask Learning
by: Xu, Wenbo, et al.
Published: (2025)

Semi-distributed Cross-modal Air-Ground Relative Localization
by: Lu, Weining, et al.
Published: (2025)

Multi-view Crowd Tracking Transformer with View-Ground Interactions Under Large Real-World Scenes
by: Zhang, Qi, et al.
Published: (2026)

Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
by: Baby, Britty, et al.
Published: (2025)

Fine-grained Spatiotemporal Grounding on Egocentric Videos
by: Liang, Shuo, et al.
Published: (2025)

FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs
by: Chen, Haodong, et al.
Published: (2024)

BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation
by: Lan, Zihan, et al.
Published: (2025)

Fine-grained Dynamic Network for Generic Event Boundary Detection
by: Zheng, Ziwei, et al.
Published: (2024)

Language-driven Fine-grained Retrieval
by: Wang, Shijie, et al.
Published: (2025)

Towards Open-world Generalized Deepfake Detection: General Feature Extraction via Unsupervised Domain Adaptation
by: Guo, Midou, et al.
Published: (2025)

Multi-modality Anomaly Segmentation on the Road
by: Gao, Heng, et al.
Published: (2025)

FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning
by: Zhang, Lu, et al.
Published: (2025)

Q-Ground: Image Quality Grounding with Large Multi-modality Models
by: Chen, Chaofeng, et al.
Published: (2024)

GroundingGPT:Language Enhanced Multi-modal Grounding Model
by: Li, Zhaowei, et al.
Published: (2024)

Safeguarding Facial Identity against Diffusion-based Face Swapping via Cascading Pathway Disruption
by: Wang, Liqin, et al.
Published: (2026)

GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
by: Chen, Xiaocan, et al.
Published: (2024)

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
by: Jin, Sheng, et al.
Published: (2024)

Cross-modal Full-mode Fine-grained Alignment for Text-to-Image Person Retrieval
by: Yin, Hao, et al.
Published: (2025)

HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
by: Xiao, Linhui, et al.
Published: (2024)

Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension
by: Miao, Peihan, et al.
Published: (2022)

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
by: Li, Yiheng, et al.
Published: (2025)

Fast-then-Fine: A Two-Stage Framework with Multi-Granular Representation for Cross-Modal Retrieval in Remote Sensing
by: Chen, Xi, et al.
Published: (2026)

AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
by: Li, Hao, et al.
Published: (2025)

Visual Grounding with Multi-modal Conditional Adaptation
by: Yao, Ruilin, et al.
Published: (2024)

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
by: Wang, Haibo, et al.
Published: (2024)

UB-FineNet: Urban Building Fine-grained Classification Network for Open-access Satellite Images
by: He, Zhiyi, et al.
Published: (2024)