:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pellegrain, Victor, Tami, Myriam, Batteux, Michel, Hudelot, Céline
Format:	Preprint
Published:	2021
Subjects:	Machine Learning Computation and Language Multimedia
Online Access:	https://arxiv.org/abs/2110.08021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Training Data Efficiency in Multimodal Process Reward Models
by: Li, Jinyuan, et al.
Published: (2026)

'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
by: Gao, Rena, et al.
Published: (2024)

ChemDFM-X: Towards Large Multimodal Model for Chemistry
by: Zhao, Zihan, et al.
Published: (2024)

MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
by: Shah, Siddhant Bikram, et al.
Published: (2024)

Multimodal Large Language Models for Medicine: A Comprehensive Survey
by: Ye, Jiarui, et al.
Published: (2025)

Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage
by: He, Ziyi, et al.
Published: (2026)

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context
by: Lee, Nahyun, et al.
Published: (2026)

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
by: Wang, Pan, et al.
Published: (2024)

Multimodal Multi-loss Fusion Network for Sentiment Analysis
by: Wu, Zehui, et al.
Published: (2023)

Multimodal Long Video Modeling Based on Temporal Dynamic Context
by: Hao, Haoran, et al.
Published: (2025)

MUDI: A Multimodal Biomedical Dataset for Understanding Pharmacodynamic Drug-Drug Interactions
by: Ngo, Tung-Lam, et al.
Published: (2025)

Doctor Sun: A Bilingual Multimodal Large Language Model for Biomedical AI
by: Xue, Dong, et al.
Published: (2025)

Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads
by: Zhang, Kunpeng, et al.
Published: (2026)

MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization
by: Saha, Anisha, et al.
Published: (2026)

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
by: Wang, Qingni, et al.
Published: (2024)

Multi-level Mixture of Experts for Multimodal Entity Linking
by: Hu, Zhiwei, et al.
Published: (2025)

Continual Multimodal Knowledge Graph Construction
by: Chen, Xiang, et al.
Published: (2023)

Calibrating Multimodal Consensus for Emotion Recognition
by: Zhong, Guowei, et al.
Published: (2025)

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
by: Geng, Tiantian, et al.
Published: (2024)

Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection
by: Fu, Rong, et al.
Published: (2026)

K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling
by: Kim, Haven, et al.
Published: (2023)

Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions
by: Paz-Argaman, Tzuf, et al.
Published: (2024)

Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU
by: Mahfuz, Rehana, et al.
Published: (2026)

Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models
by: Ren, Jie, et al.
Published: (2024)

Mixture of LoRA Experts
by: Wu, Xun, et al.
Published: (2024)

ModalImmune: Immunity Driven Unlearning via Self Destructive Training
by: Fu, Rong, et al.
Published: (2026)

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media
by: Hebert, Liam, et al.
Published: (2023)

Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts
by: Zhong, Guowei, et al.
Published: (2025)

MultiScript30k: Leveraging Multilingual Embeddings to Extend Cross Script Parallel Data
by: Driggers-Ellis, Christopher, et al.
Published: (2025)

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
by: Wu, Qiong, et al.
Published: (2024)

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
by: Henschel, Roberto, et al.
Published: (2024)

Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
by: Wang, Xiao, et al.
Published: (2024)

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
by: Cai, Zhongang, et al.
Published: (2025)

Automating Steering for Safe Multimodal Large Language Models
by: Wu, Lyucheng, et al.
Published: (2025)

Unified Hallucination Detection for Multimodal Large Language Models
by: Chen, Xiang, et al.
Published: (2024)

Large Language Models for Computer-Aided Design: A Survey
by: Zhang, Licheng, et al.
Published: (2025)

Bridging the Data Provenance Gap Across Text, Speech and Video
by: Longpre, Shayne, et al.
Published: (2024)

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
by: Weck, Benno, et al.
Published: (2024)

Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
by: Kim, Wongyu, et al.
Published: (2025)

NVLM: Open Frontier-Class Multimodal LLMs
by: Dai, Wenliang, et al.
Published: (2024)