Saved in:
| Main Authors: | Zhang, Yue, Colman, Ben, Guo, Xiao, Shahriyari, Ali, Bharaj, Gaurav |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.00126 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Common-Sense Bias Modeling for Classification Tasks
by: Zhang, Miao, et al.
Published: (2024)
by: Zhang, Miao, et al.
Published: (2024)
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
by: Oorloff, Trevine, et al.
Published: (2024)
by: Oorloff, Trevine, et al.
Published: (2024)
X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models
by: Bazyleva, Valentina, et al.
Published: (2025)
by: Bazyleva, Valentina, et al.
Published: (2025)
FaceLift: Semi-supervised 3D Facial Landmark Localization
by: Ferman, David, et al.
Published: (2024)
by: Ferman, David, et al.
Published: (2024)
Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images
by: Rykov, Elisei, et al.
Published: (2025)
by: Rykov, Elisei, et al.
Published: (2025)
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering
by: Goetting, Dylan, et al.
Published: (2024)
by: Goetting, Dylan, et al.
Published: (2024)
Probabilistic Concept Graph Reasoning for Multimodal Misinformation Detection
by: Yang, Ruichao, et al.
Published: (2026)
by: Yang, Ruichao, et al.
Published: (2026)
AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
by: Chaturvedi, Saket S., et al.
Published: (2025)
by: Chaturvedi, Saket S., et al.
Published: (2025)
Towards Attention-based Contrastive Learning for Audio Spoof Detection
by: Goel, Chirag, et al.
Published: (2024)
by: Goel, Chirag, et al.
Published: (2024)
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)
by: Lu, Meng, et al.
Published: (2025)
Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images
by: Zhang, Yichi, et al.
Published: (2025)
by: Zhang, Yichi, et al.
Published: (2025)
Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning
by: Xu, Xiaohao, et al.
Published: (2024)
by: Xu, Xiaohao, et al.
Published: (2024)
PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection
by: Baser, Oguzhan, et al.
Published: (2025)
by: Baser, Oguzhan, et al.
Published: (2025)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
by: Guo, Jarvis, et al.
Published: (2024)
by: Guo, Jarvis, et al.
Published: (2024)
ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
by: Guo, Zichun, et al.
Published: (2026)
by: Guo, Zichun, et al.
Published: (2026)
Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs
by: Dey, Abhishek, et al.
Published: (2025)
by: Dey, Abhishek, et al.
Published: (2025)
3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models
by: Zhan, Shaoxiong, et al.
Published: (2026)
by: Zhan, Shaoxiong, et al.
Published: (2026)
GRAM: Global Reasoning for Multi-Page VQA
by: Blau, Tsachi, et al.
Published: (2024)
by: Blau, Tsachi, et al.
Published: (2024)
Table Detection with Active Learning
by: Gautam, Somraj, et al.
Published: (2025)
by: Gautam, Somraj, et al.
Published: (2025)
MMGR: Multi-Modal Generative Reasoning
by: Cai, Zefan, et al.
Published: (2025)
by: Cai, Zefan, et al.
Published: (2025)
Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space
by: Liu, Chengzhi, et al.
Published: (2025)
by: Liu, Chengzhi, et al.
Published: (2025)
Play to Generalize: Learning to Reason Through Game Play
by: Xie, Yunfei, et al.
Published: (2025)
by: Xie, Yunfei, et al.
Published: (2025)
Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models
by: Xu, Jiacong, et al.
Published: (2025)
by: Xu, Jiacong, et al.
Published: (2025)
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)
by: Dai, Yifan, et al.
Published: (2026)
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
by: Ma, David, et al.
Published: (2025)
by: Ma, David, et al.
Published: (2025)
Scaling Laws for Deepfake Detection
by: Wang, Wenhao, et al.
Published: (2025)
by: Wang, Wenhao, et al.
Published: (2025)
A Unified Hallucination Mitigation Framework for Large Vision-Language Models
by: Chang, Yue, et al.
Published: (2024)
by: Chang, Yue, et al.
Published: (2024)
MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models
by: Xia, Yinan, et al.
Published: (2025)
by: Xia, Yinan, et al.
Published: (2025)
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality
by: Qian, Kai, et al.
Published: (2026)
by: Qian, Kai, et al.
Published: (2026)
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
by: Foteinopoulou, Niki Maria, et al.
Published: (2024)
by: Foteinopoulou, Niki Maria, et al.
Published: (2024)
ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts
by: Su, Ruiran, et al.
Published: (2025)
by: Su, Ruiran, et al.
Published: (2025)
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)
by: Zhang, Wenqi, et al.
Published: (2025)
MiRAGeNews: Multimodal Realistic AI-Generated News Detection
by: Huang, Runsheng, et al.
Published: (2024)
by: Huang, Runsheng, et al.
Published: (2024)
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
by: Li, Yunxin, et al.
Published: (2025)
by: Li, Yunxin, et al.
Published: (2025)
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)
by: Wei, Yana, et al.
Published: (2025)
Cerberus: Real-Time Video Anomaly Detection via Cascaded Vision-Language Models
by: Zheng, Yue, et al.
Published: (2025)
by: Zheng, Yue, et al.
Published: (2025)
Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection
by: Masri, Sari, et al.
Published: (2025)
by: Masri, Sari, et al.
Published: (2025)
EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding
by: Wang, Ziyang, et al.
Published: (2026)
by: Wang, Ziyang, et al.
Published: (2026)
RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
by: Chen, Zhenyuan, et al.
Published: (2025)
by: Chen, Zhenyuan, et al.
Published: (2025)
Similar Items
-
Common-Sense Bias Modeling for Classification Tasks
by: Zhang, Miao, et al.
Published: (2024) -
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
by: Oorloff, Trevine, et al.
Published: (2024) -
X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models
by: Bazyleva, Valentina, et al.
Published: (2025) -
FaceLift: Semi-supervised 3D Facial Landmark Localization
by: Ferman, David, et al.
Published: (2024) -
Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images
by: Rykov, Elisei, et al.
Published: (2025)