Saved in:
| Main Authors: | Conde, Javier, Cheung, Tobias, Martínez, Gonzalo, Reviriego, Pedro, Sarkar, Rik |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.16297 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OpenVNA: A Framework for Analyzing the Behavior of Multimodal Language Understanding System under Noisy Scenarios
by: Yuan, Ziqi, et al.
Published: (2024)
by: Yuan, Ziqi, et al.
Published: (2024)
Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review
by: Cohn, Clayton, et al.
Published: (2024)
by: Cohn, Clayton, et al.
Published: (2024)
Hyperbolic Multimodal Generative Representation Learning for Generalized Zero-Shot Multimodal Information Extraction
by: Zhou, Baohang, et al.
Published: (2026)
by: Zhou, Baohang, et al.
Published: (2026)
DIVA: Harnessing the Representation Divergence in Unified Multimodal Models for Mutual Reinforcement
by: Lu, Renjie, et al.
Published: (2026)
by: Lu, Renjie, et al.
Published: (2026)
PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology
by: Sun, Yuxuan, et al.
Published: (2023)
by: Sun, Yuxuan, et al.
Published: (2023)
Understanding the Impact of Artificial Intelligence in Academic Writing: Metadata to the Rescue
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing
by: Zhang, Juan, et al.
Published: (2024)
by: Zhang, Juan, et al.
Published: (2024)
Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
by: Tong, Haonan, et al.
Published: (2024)
by: Tong, Haonan, et al.
Published: (2024)
A Shift In Artistic Practices through Artificial Intelligence
by: Tatar, Kıvanç, et al.
Published: (2023)
by: Tatar, Kıvanç, et al.
Published: (2023)
GalleryGPT: Analyzing Paintings with Large Multimodal Models
by: Bin, Yi, et al.
Published: (2024)
by: Bin, Yi, et al.
Published: (2024)
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
by: Bai, Hayes, et al.
Published: (2026)
by: Bai, Hayes, et al.
Published: (2026)
Augmenting Intra-Modal Understanding in MLLMs for Robust Multimodal Keyphrase Generation
by: Cao, Jiajun, et al.
Published: (2025)
by: Cao, Jiajun, et al.
Published: (2025)
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey
by: Lin, Qika, et al.
Published: (2024)
by: Lin, Qika, et al.
Published: (2024)
Intelligent Carrier Allocation: A Cross-Modal Reasoning Framework for Adaptive Multimodal Steganography
by: Das, Abhirup, et al.
Published: (2025)
by: Das, Abhirup, et al.
Published: (2025)
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
by: Sun, Shilin, et al.
Published: (2024)
by: Sun, Shilin, et al.
Published: (2024)
SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation
by: Huynh, Ngoc Dung, et al.
Published: (2025)
by: Huynh, Ngoc Dung, et al.
Published: (2025)
Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing
by: Macchiavello, Bruno, et al.
Published: (2013)
by: Macchiavello, Bruno, et al.
Published: (2013)
Multimodal Classification and Out-of-distribution Detection for Multimodal Intent Understanding
by: Zhang, Hanlei, et al.
Published: (2024)
by: Zhang, Hanlei, et al.
Published: (2024)
Artificial Intelligence and Misinformation in Art: Can Vision Language Models Judge the Hand or the Machine Behind the Canvas?
by: Fu, Tarian, et al.
Published: (2025)
by: Fu, Tarian, et al.
Published: (2025)
Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark
by: Zhang, Han, et al.
Published: (2025)
by: Zhang, Han, et al.
Published: (2025)
MInD: Improving Multimodal Sentiment Analysis via Multimodal Information Disentanglement
by: Dai, Weichen, et al.
Published: (2024)
by: Dai, Weichen, et al.
Published: (2024)
Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning
by: Zhao, Yu, et al.
Published: (2025)
by: Zhao, Yu, et al.
Published: (2025)
MULTI-CASE: A Transformer-based Ethics-aware Multimodal Investigative Intelligence Framework
by: Fischer, Maximilian T., et al.
Published: (2024)
by: Fischer, Maximilian T., et al.
Published: (2024)
Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey
by: Liu, Qijiong, et al.
Published: (2024)
by: Liu, Qijiong, et al.
Published: (2024)
Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction
by: Zhou, Baohang, et al.
Published: (2025)
by: Zhou, Baohang, et al.
Published: (2025)
MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks
by: Yang, Xiaocui, et al.
Published: (2024)
by: Yang, Xiaocui, et al.
Published: (2024)
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)
by: Chen, Junjie, et al.
Published: (2025)
Recursive InPainting (RIP): how much information is lost under recursive inferences?
by: Conde, Javier, et al.
Published: (2024)
by: Conde, Javier, et al.
Published: (2024)
MCAD: Multimodal Context-Aware Audio Description Generation For Soccer
by: Chaudhary, Lipisha, et al.
Published: (2025)
by: Chaudhary, Lipisha, et al.
Published: (2025)
Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads
by: Zhang, Kunpeng, et al.
Published: (2026)
by: Zhang, Kunpeng, et al.
Published: (2026)
Is There a Case for Conversation Optimized Tokenizers in Large Language Models?
by: Ferrando, Raquel, et al.
Published: (2025)
by: Ferrando, Raquel, et al.
Published: (2025)
Exploring the Role of Audio in Multimodal Misinformation Detection
by: Liu, Moyang, et al.
Published: (2024)
by: Liu, Moyang, et al.
Published: (2024)
Towards Multimodal Emotional Support Conversation Systems
by: Chu, Yuqi, et al.
Published: (2024)
by: Chu, Yuqi, et al.
Published: (2024)
Multimodal Emotion Recognition with Large Language Models
by: Zhang, Hongrui, et al.
Published: (2026)
by: Zhang, Hongrui, et al.
Published: (2026)
Intelligent Text-Conditioned Music Generation
by: Xie, Zhouyao, et al.
Published: (2024)
by: Xie, Zhouyao, et al.
Published: (2024)
Contrastive Knowledge Distillation for Robust Multimodal Sentiment Analysis
by: Sang, Zhongyi, et al.
Published: (2024)
by: Sang, Zhongyi, et al.
Published: (2024)
Multimodal LLM-based Query Paraphrasing for Video Search
by: Wu, Jiaxin, et al.
Published: (2024)
by: Wu, Jiaxin, et al.
Published: (2024)
From Natural Alignment to Conditional Controllability in Multimodal Dialogue
by: Jin, Zeyu, et al.
Published: (2026)
by: Jin, Zeyu, et al.
Published: (2026)
MV-Crafter: An Intelligent System for Music-guided Video Generation
by: Chen, Chuer, et al.
Published: (2025)
by: Chen, Chuer, et al.
Published: (2025)
Analyzing Diffusion and Autoregressive Vision Language Models in Multimodal Embedding Space
by: Wang, Zihang, et al.
Published: (2026)
by: Wang, Zihang, et al.
Published: (2026)
Similar Items
-
OpenVNA: A Framework for Analyzing the Behavior of Multimodal Language Understanding System under Noisy Scenarios
by: Yuan, Ziqi, et al.
Published: (2024) -
Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review
by: Cohn, Clayton, et al.
Published: (2024) -
Hyperbolic Multimodal Generative Representation Learning for Generalized Zero-Shot Multimodal Information Extraction
by: Zhou, Baohang, et al.
Published: (2026) -
DIVA: Harnessing the Representation Divergence in Unified Multimodal Models for Mutual Reinforcement
by: Lu, Renjie, et al.
Published: (2026) -
PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology
by: Sun, Yuxuan, et al.
Published: (2023)