Saved in:
| Main Authors: | Zhao, Haochen, Kong, Yuyao, Xu, Yongxiu, Gou, Gaopeng, Xu, Hongbo, Wang, Yubin, Zhang, Haoliang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.23299 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts
by: Lin, Xinkui, et al.
Published: (2025)
by: Lin, Xinkui, et al.
Published: (2025)
URMF: Uncertainty-aware Robust Multimodal Fusion for Multimodal Sarcasm Detection
by: Wang, Zhenyu, et al.
Published: (2026)
by: Wang, Zhenyu, et al.
Published: (2026)
MMSD-Net: Towards Multi-modal Stuttering Detection
by: Nie, Liangyu, et al.
Published: (2024)
by: Nie, Liangyu, et al.
Published: (2024)
T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval
by: Li, Yili, et al.
Published: (2025)
by: Li, Yili, et al.
Published: (2025)
M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System
by: Kong, Chenqi, et al.
Published: (2023)
by: Kong, Chenqi, et al.
Published: (2023)
PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks
by: Xu, Jingning, et al.
Published: (2026)
by: Xu, Jingning, et al.
Published: (2026)
LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs
by: Xu, Zitong, et al.
Published: (2025)
by: Xu, Zitong, et al.
Published: (2025)
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding
by: Zhang, Pingping, et al.
Published: (2024)
by: Zhang, Pingping, et al.
Published: (2024)
Does Unification Come at a Cost? Uni-SafeBench: A Safety Benchmark for Unified Multimodal Large Models
by: Peng, Zixiang, et al.
Published: (2026)
by: Peng, Zixiang, et al.
Published: (2026)
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
by: Qiao, Xiangshuo, et al.
Published: (2024)
by: Qiao, Xiangshuo, et al.
Published: (2024)
RealX3D: A Physically-Degraded 3D Benchmark for Multi-view Visual Restoration and Reconstruction
by: Liu, Shuhong, et al.
Published: (2025)
by: Liu, Shuhong, et al.
Published: (2025)
Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
by: Xu, Jiaqi, et al.
Published: (2024)
by: Xu, Jiaqi, et al.
Published: (2024)
Robust Modality-incomplete Anomaly Detection: A Modality-instructive Framework with Benchmark
by: Miao, Bingchen, et al.
Published: (2024)
by: Miao, Bingchen, et al.
Published: (2024)
MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion
by: Liu, Yu, et al.
Published: (2024)
by: Liu, Yu, et al.
Published: (2024)
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis
by: Chen, Shuang, et al.
Published: (2026)
by: Chen, Shuang, et al.
Published: (2026)
Evaluating Multimodal Large Language Models on Spoken Sarcasm Understanding
by: Li, Zhu, et al.
Published: (2025)
by: Li, Zhu, et al.
Published: (2025)
DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection
by: Zhao, Kangran, et al.
Published: (2025)
by: Zhao, Kangran, et al.
Published: (2025)
Interpretable Multimodal Misinformation Detection with Logic Reasoning
by: Liu, Hui, et al.
Published: (2023)
by: Liu, Hui, et al.
Published: (2023)
LungCURE: Benchmarking Multimodal Real-World Clinical Reasoning for Precision Lung Cancer Diagnosis and Treatment
by: Hao, Fangyu, et al.
Published: (2026)
by: Hao, Fangyu, et al.
Published: (2026)
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
by: Zhang, Zhenxing, et al.
Published: (2024)
by: Zhang, Zhenxing, et al.
Published: (2024)
MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation
by: Xu, Shuolin, et al.
Published: (2025)
by: Xu, Shuolin, et al.
Published: (2025)
Kandinsky 3.0 Technical Report
by: Arkhipkin, Vladimir, et al.
Published: (2023)
by: Arkhipkin, Vladimir, et al.
Published: (2023)
Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal Violence Detection
by: Sun, Shengyang, et al.
Published: (2024)
by: Sun, Shengyang, et al.
Published: (2024)
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)
by: Chen, Junjie, et al.
Published: (2025)
HarmonyIQA: Pioneering Benchmark and Model for Image Harmonization Quality Assessment
by: Xu, Zitong, et al.
Published: (2025)
by: Xu, Zitong, et al.
Published: (2025)
Multi-source Multimodal Progressive Domain Adaption for Audio-Visual Deception Detection
by: Lin, Ronghao, et al.
Published: (2025)
by: Lin, Ronghao, et al.
Published: (2025)
MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition
by: Zhang, Haoyang, et al.
Published: (2025)
by: Zhang, Haoyang, et al.
Published: (2025)
"Humor, Art, or Misinformation?": A Multimodal Dataset for Intent-Aware Synthetic Image Detection
by: Skoularikis, Anastasios, et al.
Published: (2025)
by: Skoularikis, Anastasios, et al.
Published: (2025)
FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models
by: Li, Yixuan, et al.
Published: (2024)
by: Li, Yixuan, et al.
Published: (2024)
FoodLogAthl-218: Constructing a Real-World Food Image Dataset Using Dietary Management Applications
by: Watanabe, Mitsuki, et al.
Published: (2025)
by: Watanabe, Mitsuki, et al.
Published: (2025)
MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video
by: Xing, Qunliang, et al.
Published: (2019)
by: Xing, Qunliang, et al.
Published: (2019)
Harmfully Manipulated Images Matter in Multimodal Misinformation Detection
by: Wang, Bing, et al.
Published: (2024)
by: Wang, Bing, et al.
Published: (2024)
PointCoT: A Multi-modal Benchmark for Explicit 3D Geometric Reasoning
by: Zhang, Dongxu, et al.
Published: (2026)
by: Zhang, Dongxu, et al.
Published: (2026)
Detached and Interactive Multimodal Learning
by: Fan, Yunfeng, et al.
Published: (2024)
by: Fan, Yunfeng, et al.
Published: (2024)
FCMBench: The First Large-scale Financial Credit Multimodal Benchmark for Real-world Applications
by: Yang, Yehui, et al.
Published: (2026)
by: Yang, Yehui, et al.
Published: (2026)
Test-time adaptation for image compression with distribution regularization
by: Chen, Kecheng, et al.
Published: (2024)
by: Chen, Kecheng, et al.
Published: (2024)
BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving
by: Wang, Yuhang, et al.
Published: (2026)
by: Wang, Yuhang, et al.
Published: (2026)
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
by: Zhang, Hanlei, et al.
Published: (2024)
by: Zhang, Hanlei, et al.
Published: (2024)
MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models
by: Guo, Zile, et al.
Published: (2026)
by: Guo, Zile, et al.
Published: (2026)
MangaUB: A Manga Understanding Benchmark for Large Multimodal Models
by: Ikuta, Hikaru, et al.
Published: (2024)
by: Ikuta, Hikaru, et al.
Published: (2024)
Similar Items
-
REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts
by: Lin, Xinkui, et al.
Published: (2025) -
URMF: Uncertainty-aware Robust Multimodal Fusion for Multimodal Sarcasm Detection
by: Wang, Zhenyu, et al.
Published: (2026) -
MMSD-Net: Towards Multi-modal Stuttering Detection
by: Nie, Liangyu, et al.
Published: (2024) -
T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval
by: Li, Yili, et al.
Published: (2025) -
M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System
by: Kong, Chenqi, et al.
Published: (2023)