:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yue, Colman, Ben, Guo, Xiao, Shahriyari, Ali, Bharaj, Gaurav
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2402.00126
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Common-Sense Bias Modeling for Classification Tasks
by: Zhang, Miao, et al.
Published: (2024)

AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
by: Oorloff, Trevine, et al.
Published: (2024)

X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models
by: Bazyleva, Valentina, et al.
Published: (2025)

FaceLift: Semi-supervised 3D Facial Landmark Localization
by: Ferman, David, et al.
Published: (2024)

Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images
by: Rykov, Elisei, et al.
Published: (2025)

End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering
by: Goetting, Dylan, et al.
Published: (2024)

Probabilistic Concept Graph Reasoning for Multimodal Misinformation Detection
by: Yang, Ruichao, et al.
Published: (2026)

AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
by: Chaturvedi, Saket S., et al.
Published: (2025)

Towards Attention-based Contrastive Learning for Audio Spoof Detection
by: Goel, Chirag, et al.
Published: (2024)

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)

Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images
by: Zhang, Yichi, et al.
Published: (2025)

Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning
by: Xu, Xiaohao, et al.
Published: (2024)

PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection
by: Baser, Oguzhan, et al.
Published: (2025)

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
by: Guo, Jarvis, et al.
Published: (2024)

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
by: Guo, Zichun, et al.
Published: (2026)

Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs
by: Dey, Abhishek, et al.
Published: (2025)

3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models
by: Zhan, Shaoxiong, et al.
Published: (2026)

GRAM: Global Reasoning for Multi-Page VQA
by: Blau, Tsachi, et al.
Published: (2024)

Table Detection with Active Learning
by: Gautam, Somraj, et al.
Published: (2025)

MMGR: Multi-Modal Generative Reasoning
by: Cai, Zefan, et al.
Published: (2025)

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space
by: Liu, Chengzhi, et al.
Published: (2025)

Play to Generalize: Learning to Reason Through Game Play
by: Xie, Yunfei, et al.
Published: (2025)

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models
by: Xu, Jiacong, et al.
Published: (2025)

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
by: Ma, David, et al.
Published: (2025)

Scaling Laws for Deepfake Detection
by: Wang, Wenhao, et al.
Published: (2025)

A Unified Hallucination Mitigation Framework for Large Vision-Language Models
by: Chang, Yue, et al.
Published: (2024)

MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models
by: Xia, Yinan, et al.
Published: (2025)

DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)

Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality
by: Qian, Kai, et al.
Published: (2026)

A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
by: Foteinopoulou, Niki Maria, et al.
Published: (2024)

ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts
by: Su, Ruiran, et al.
Published: (2025)

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)

MiRAGeNews: Multimodal Realistic AI-Generated News Detection
by: Huang, Runsheng, et al.
Published: (2024)

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
by: Li, Yunxin, et al.
Published: (2025)

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)

Cerberus: Real-Time Video Anomaly Detection via Cascaded Vision-Language Models
by: Zheng, Yue, et al.
Published: (2025)

Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection
by: Masri, Sari, et al.
Published: (2025)

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding
by: Wang, Ziyang, et al.
Published: (2026)

RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
by: Chen, Zhenyuan, et al.
Published: (2025)