Saved in:
| Main Authors: | Song, Xinhao, Su, Su, Song, Sirui, Wu, Hongliang, Shen, Wen, Wei, Zhihua, Liu, Gongshen, Zhang, Linfeng, Liu, Dongrui |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2606.02449 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024)
by: Su, Taoyu, et al.
Published: (2024)
Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective
by: Su, Taoyu, et al.
Published: (2025)
by: Su, Taoyu, et al.
Published: (2025)
IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024)
by: Su, Taoyu, et al.
Published: (2024)
Dual Knowledge-Enhanced Two-Stage Reasoner for Multimodal Dialog Systems
by: Chen, Xiaolin, et al.
Published: (2025)
by: Chen, Xiaolin, et al.
Published: (2025)
Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
by: Ouyang, Kun, et al.
Published: (2024)
by: Ouyang, Kun, et al.
Published: (2024)
Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection
by: Fu, Rong, et al.
Published: (2026)
by: Fu, Rong, et al.
Published: (2026)
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)
by: Chen, Junjie, et al.
Published: (2025)
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
by: Liu, Zhiyuan, et al.
Published: (2023)
by: Liu, Zhiyuan, et al.
Published: (2023)
Rethinking Radiology Report Generation via Causal Inspired Counterfactual Augmentation
by: Song, Xiao, et al.
Published: (2023)
by: Song, Xiao, et al.
Published: (2023)
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
by: Zhang, Lei, et al.
Published: (2025)
by: Zhang, Lei, et al.
Published: (2025)
Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion
by: Zhao, Yu, et al.
Published: (2024)
by: Zhao, Yu, et al.
Published: (2024)
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
by: Zhang, Hanlei, et al.
Published: (2024)
by: Zhang, Hanlei, et al.
Published: (2024)
TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition
by: Wang, Xinyu, et al.
Published: (2026)
by: Wang, Xinyu, et al.
Published: (2026)
Traj-MLLM: Can Multimodal Large Language Models Reform Trajectory Data Mining?
by: Liu, Shuo, et al.
Published: (2025)
by: Liu, Shuo, et al.
Published: (2025)
TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis
by: Quan, Weize, et al.
Published: (2024)
by: Quan, Weize, et al.
Published: (2024)
Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models
by: Shi, Xiang, et al.
Published: (2024)
by: Shi, Xiang, et al.
Published: (2024)
RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios
by: Zhao, Fei, et al.
Published: (2025)
by: Zhao, Fei, et al.
Published: (2025)
CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation
by: Zhu, Xiaofei, et al.
Published: (2024)
by: Zhu, Xiaofei, et al.
Published: (2024)
Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction
by: Nguyen, Cam-Van Thi, et al.
Published: (2023)
by: Nguyen, Cam-Van Thi, et al.
Published: (2023)
Retrieval-Augmented Generation for Electrocardiogram-Language Models
by: Song, Xiaoyu, et al.
Published: (2025)
by: Song, Xiaoyu, et al.
Published: (2025)
Shapley Value-based Contrastive Alignment for Multimodal Information Extraction
by: Luo, Wen, et al.
Published: (2024)
by: Luo, Wen, et al.
Published: (2024)
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
by: Chow, Wei, et al.
Published: (2025)
by: Chow, Wei, et al.
Published: (2025)
Interpreting the linear structure of vision-language model embedding spaces
by: Papadimitriou, Isabel, et al.
Published: (2025)
by: Papadimitriou, Isabel, et al.
Published: (2025)
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
by: Liu, Peipei, et al.
Published: (2023)
by: Liu, Peipei, et al.
Published: (2023)
Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study
by: Ghorbanpour, Faeze, et al.
Published: (2025)
by: Ghorbanpour, Faeze, et al.
Published: (2025)
LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation
by: Wei, Jingxuan, et al.
Published: (2025)
by: Wei, Jingxuan, et al.
Published: (2025)
ResearchPulse: Building Method-Experiment Chains through Multi-Document Scientific Inference
by: Chen, Qi, et al.
Published: (2025)
by: Chen, Qi, et al.
Published: (2025)
Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation
by: Liu, Dancheng, et al.
Published: (2025)
by: Liu, Dancheng, et al.
Published: (2025)
ChronusOmni: Improving Time Awareness of Omni Large Language Models
by: Chen, Yijing, et al.
Published: (2025)
by: Chen, Yijing, et al.
Published: (2025)
Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model
by: Chen, Xiaolin, et al.
Published: (2022)
by: Chen, Xiaolin, et al.
Published: (2022)
SoMeLVLM: A Large Vision Language Model for Social Media Processing
by: Zhang, Xinnong, et al.
Published: (2024)
by: Zhang, Xinnong, et al.
Published: (2024)
Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
by: Wu, Zichen, et al.
Published: (2024)
by: Wu, Zichen, et al.
Published: (2024)
Lost in Overlap: Exploring Logit-based Watermark Collision in LLMs
by: Luo, Yiyang, et al.
Published: (2024)
by: Luo, Yiyang, et al.
Published: (2024)
Priority prediction of Asian Hornet sighting report using machine learning methods
by: Liu, Yixin, et al.
Published: (2021)
by: Liu, Yixin, et al.
Published: (2021)
Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data
by: Ghorbanpour, Faeze, et al.
Published: (2025)
by: Ghorbanpour, Faeze, et al.
Published: (2025)
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation
by: Zhou, Ziya, et al.
Published: (2024)
by: Zhou, Ziya, et al.
Published: (2024)
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
by: Peng, Yuezhang, et al.
Published: (2025)
by: Peng, Yuezhang, et al.
Published: (2025)
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
by: Mei, Xinhao, et al.
Published: (2023)
by: Mei, Xinhao, et al.
Published: (2023)
Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis
by: Liu, Hao, et al.
Published: (2025)
by: Liu, Hao, et al.
Published: (2025)
Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck
by: Cui, Shiyao, et al.
Published: (2023)
by: Cui, Shiyao, et al.
Published: (2023)
Similar Items
-
LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024) -
Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective
by: Su, Taoyu, et al.
Published: (2025) -
IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024) -
Dual Knowledge-Enhanced Two-Stage Reasoner for Multimodal Dialog Systems
by: Chen, Xiaolin, et al.
Published: (2025) -
Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
by: Ouyang, Kun, et al.
Published: (2024)