:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Conde, Javier, Cheung, Tobias, Martínez, Gonzalo, Reviriego, Pedro, Sarkar, Rik
Format:	Preprint
Published:	2024
Subjects:	Multimedia
Online Access:	https://arxiv.org/abs/2409.16297
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

OpenVNA: A Framework for Analyzing the Behavior of Multimodal Language Understanding System under Noisy Scenarios
by: Yuan, Ziqi, et al.
Published: (2024)

Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review
by: Cohn, Clayton, et al.
Published: (2024)

Hyperbolic Multimodal Generative Representation Learning for Generalized Zero-Shot Multimodal Information Extraction
by: Zhou, Baohang, et al.
Published: (2026)

DIVA: Harnessing the Representation Divergence in Unified Multimodal Models for Mutual Reinforcement
by: Lu, Renjie, et al.
Published: (2026)

PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology
by: Sun, Yuxuan, et al.
Published: (2023)

Understanding the Impact of Artificial Intelligence in Academic Writing: Metadata to the Rescue
by: Conde, Javier, et al.
Published: (2025)

Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing
by: Zhang, Juan, et al.
Published: (2024)

Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
by: Tong, Haonan, et al.
Published: (2024)

A Shift In Artistic Practices through Artificial Intelligence
by: Tatar, Kıvanç, et al.
Published: (2023)

GalleryGPT: Analyzing Paintings with Large Multimodal Models
by: Bin, Yi, et al.
Published: (2024)

UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
by: Bai, Hayes, et al.
Published: (2026)

Augmenting Intra-Modal Understanding in MLLMs for Robust Multimodal Keyphrase Generation
by: Cao, Jiajun, et al.
Published: (2025)

Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey
by: Lin, Qika, et al.
Published: (2024)

Intelligent Carrier Allocation: A Cross-Modal Reasoning Framework for Adaptive Multimodal Steganography
by: Das, Abhirup, et al.
Published: (2025)

A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
by: Sun, Shilin, et al.
Published: (2024)

SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation
by: Huynh, Ngoc Dung, et al.
Published: (2025)

Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing
by: Macchiavello, Bruno, et al.
Published: (2013)

Multimodal Classification and Out-of-distribution Detection for Multimodal Intent Understanding
by: Zhang, Hanlei, et al.
Published: (2024)

Artificial Intelligence and Misinformation in Art: Can Vision Language Models Judge the Hand or the Machine Behind the Canvas?
by: Fu, Tarian, et al.
Published: (2025)

Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark
by: Zhang, Han, et al.
Published: (2025)

MInD: Improving Multimodal Sentiment Analysis via Multimodal Information Disentanglement
by: Dai, Weichen, et al.
Published: (2024)

Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning
by: Zhao, Yu, et al.
Published: (2025)

MULTI-CASE: A Transformer-based Ethics-aware Multimodal Investigative Intelligence Framework
by: Fischer, Maximilian T., et al.
Published: (2024)

Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey
by: Liu, Qijiong, et al.
Published: (2024)

Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction
by: Zhou, Baohang, et al.
Published: (2025)

MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks
by: Yang, Xiaocui, et al.
Published: (2024)

Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)

Recursive InPainting (RIP): how much information is lost under recursive inferences?
by: Conde, Javier, et al.
Published: (2024)

MCAD: Multimodal Context-Aware Audio Description Generation For Soccer
by: Chaudhary, Lipisha, et al.
Published: (2025)

Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads
by: Zhang, Kunpeng, et al.
Published: (2026)

Is There a Case for Conversation Optimized Tokenizers in Large Language Models?
by: Ferrando, Raquel, et al.
Published: (2025)

Exploring the Role of Audio in Multimodal Misinformation Detection
by: Liu, Moyang, et al.
Published: (2024)

Towards Multimodal Emotional Support Conversation Systems
by: Chu, Yuqi, et al.
Published: (2024)

Multimodal Emotion Recognition with Large Language Models
by: Zhang, Hongrui, et al.
Published: (2026)

Intelligent Text-Conditioned Music Generation
by: Xie, Zhouyao, et al.
Published: (2024)

Contrastive Knowledge Distillation for Robust Multimodal Sentiment Analysis
by: Sang, Zhongyi, et al.
Published: (2024)

Multimodal LLM-based Query Paraphrasing for Video Search
by: Wu, Jiaxin, et al.
Published: (2024)

From Natural Alignment to Conditional Controllability in Multimodal Dialogue
by: Jin, Zeyu, et al.
Published: (2026)

MV-Crafter: An Intelligent System for Music-guided Video Generation
by: Chen, Chuer, et al.
Published: (2025)

Analyzing Diffusion and Autoregressive Vision Language Models in Multimodal Embedding Space
by: Wang, Zihang, et al.
Published: (2026)