:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Song, Xinhao, Su, Su, Song, Sirui, Wu, Hongliang, Shen, Wen, Wei, Zhihua, Liu, Gongshen, Zhang, Linfeng, Liu, Dongrui
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning Multimedia
Online Access:	https://arxiv.org/abs/2606.02449
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024)

Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective
by: Su, Taoyu, et al.
Published: (2025)

IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024)

Dual Knowledge-Enhanced Two-Stage Reasoner for Multimodal Dialog Systems
by: Chen, Xiaolin, et al.
Published: (2025)

Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
by: Ouyang, Kun, et al.
Published: (2024)

Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection
by: Fu, Rong, et al.
Published: (2026)

Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
by: Liu, Zhiyuan, et al.
Published: (2023)

Rethinking Radiology Report Generation via Causal Inspired Counterfactual Augmentation
by: Song, Xiao, et al.
Published: (2023)

MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
by: Zhang, Lei, et al.
Published: (2025)

Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion
by: Zhao, Yu, et al.
Published: (2024)

MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
by: Zhang, Hanlei, et al.
Published: (2024)

TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition
by: Wang, Xinyu, et al.
Published: (2026)

Traj-MLLM: Can Multimodal Large Language Models Reform Trajectory Data Mining?
by: Liu, Shuo, et al.
Published: (2025)

TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis
by: Quan, Weize, et al.
Published: (2024)

Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models
by: Shi, Xiang, et al.
Published: (2024)

RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios
by: Zhao, Fei, et al.
Published: (2025)

CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation
by: Zhu, Xiaofei, et al.
Published: (2024)

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction
by: Nguyen, Cam-Van Thi, et al.
Published: (2023)

Retrieval-Augmented Generation for Electrocardiogram-Language Models
by: Song, Xiaoyu, et al.
Published: (2025)

Shapley Value-based Contrastive Alignment for Multimodal Information Extraction
by: Luo, Wen, et al.
Published: (2024)

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
by: Chow, Wei, et al.
Published: (2025)

Interpreting the linear structure of vision-language model embedding spaces
by: Papadimitriou, Isabel, et al.
Published: (2025)

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
by: Liu, Peipei, et al.
Published: (2023)

Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study
by: Ghorbanpour, Faeze, et al.
Published: (2025)

LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation
by: Wei, Jingxuan, et al.
Published: (2025)

ResearchPulse: Building Method-Experiment Chains through Multi-Document Scientific Inference
by: Chen, Qi, et al.
Published: (2025)

Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation
by: Liu, Dancheng, et al.
Published: (2025)

ChronusOmni: Improving Time Awareness of Omni Large Language Models
by: Chen, Yijing, et al.
Published: (2025)

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model
by: Chen, Xiaolin, et al.
Published: (2022)

SoMeLVLM: A Large Vision Language Model for Social Media Processing
by: Zhang, Xinnong, et al.
Published: (2024)

Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
by: Wu, Zichen, et al.
Published: (2024)

Lost in Overlap: Exploring Logit-based Watermark Collision in LLMs
by: Luo, Yiyang, et al.
Published: (2024)

Priority prediction of Asian Hornet sighting report using machine learning methods
by: Liu, Yixin, et al.
Published: (2021)

Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data
by: Ghorbanpour, Faeze, et al.
Published: (2025)

Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation
by: Zhou, Ziya, et al.
Published: (2024)

MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
by: Peng, Yuezhang, et al.
Published: (2025)

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
by: Mei, Xinhao, et al.
Published: (2023)

Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis
by: Liu, Hao, et al.
Published: (2025)

Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck
by: Cui, Shiyao, et al.
Published: (2023)