:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiong, Jiayu, Wang, Jing, Zhang, Qi, Wang, Wanlong, Xue, Jun
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.23113
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multimodal Fusion via Self-Consistent Task-Gradient Fields
by: Xiong, Jiayu, et al.
Published: (2024)

Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies
by: Tian, Mulin, et al.
Published: (2023)

Geometry-based Schrödinger Bridges for Trustworthy Multimodal Fusion
by: Xiong, Jiayu, et al.
Published: (2026)

Exposing Lip-syncing Deepfakes from Mouth Inconsistencies
by: Datta, Soumyya Kanti, et al.
Published: (2024)

Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection
by: Xu, Yuting, et al.
Published: (2024)

Audio-Visual Deepfake Detection With Local Temporal Inconsistencies
by: Astrid, Marcella, et al.
Published: (2025)

Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network
by: Qiu, Xingyu, et al.
Published: (2025)

Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection
by: Cobo, Alejandro, et al.
Published: (2025)

Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies
by: Datta, Soumyya Kanti, et al.
Published: (2025)

CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement
by: Dong, Xuanzhao, et al.
Published: (2024)

CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning
by: Fang, Xiang, et al.
Published: (2026)

RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning
by: Wang, Jiacheng, et al.
Published: (2024)

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
by: Yang, Longrong, et al.
Published: (2023)

Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization
by: Anshul, Ashutosh, et al.
Published: (2025)

SLAP: The Semantic Least Action Principle for Variational Video-Language Modeling
by: Fang, Xiang, et al.
Published: (2026)

Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation
by: Fang, Xiang, et al.
Published: (2026)

Object-level Copy-Move Forgery Image Detection based on Inconsistency Mining
by: Wang, Jingyu, et al.
Published: (2024)

Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies
by: Astrid, Marcella, et al.
Published: (2024)

Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
by: Liang, Jiafeng, et al.
Published: (2025)

GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization
by: Song, Zixuan, et al.
Published: (2025)

AsyncDSB: Schedule-Asynchronous Diffusion Schrödinger Bridge for Image Inpainting
by: Han, Zihao, et al.
Published: (2024)

Referee: Reference-aware Audiovisual Deepfake Detection
by: Boo, Hyemin, et al.
Published: (2025)

Diffusion Deepfake
by: Bhattacharyya, Chaitali, et al.
Published: (2024)

DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models
by: Wang, Jiarui, et al.
Published: (2025)

Age-Diverse Deepfake Dataset: Bridging the Age Gap in Deepfake Detection
by: Joshi, Unisha
Published: (2025)

FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
by: Wang, Tianyi, et al.
Published: (2025)

Can We Leave Deepfake Data Behind in Training Deepfake Detector?
by: Cheng, Jikang, et al.
Published: (2024)

SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird's-Eye-View Segmentation
by: Zhou, Zijie, et al.
Published: (2025)

DGM4+: Dataset Extension for Global Scene Inconsistency
by: Singh, Gagandeep, et al.
Published: (2025)

Immuno-VLM: Immunizing Large Vision-Language Models via Generative Semantic Antibodies for Open-World Trustworthiness
by: Fang, Xiang, et al.
Published: (2026)

A Timely Survey on Vision Transformer for Deepfake Detection
by: Wang, Zhikan, et al.
Published: (2024)

M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
by: Wang, Hongyu, et al.
Published: (2024)

Training-Free Multimodal Deepfake Detection via Graph Reasoning
by: Liu, Yuxin, et al.
Published: (2025)

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
by: Shen, Dazhong, et al.
Published: (2024)

DREAM: A Benchmark Study for Deepfake photoREalism AssessMent
by: Peng, Bo, et al.
Published: (2025)

When Schrödinger Bridge Meets Real-World Image Dehazing with Unpaired Training
by: Lan, Yunwei, et al.
Published: (2025)

PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
by: Selch, Lukas, et al.
Published: (2025)

Rethinking Video-Language Model from the Language Input Perspective
by: Fang, Xiang, et al.
Published: (2026)

MSCT: Differential Cross-Modal Attention for Deepfake Detection
by: Wei, Fangda, et al.
Published: (2026)

SchröMind: Mitigating Hallucinations in Multimodal Large Language Models via Solving the Schrödinger Bridge Problem
by: Shi, Ziqiang, et al.
Published: (2026)