Saved in:
| Main Authors: | Pellegrain, Victor, Tami, Myriam, Batteux, Michel, Hudelot, Céline |
|---|---|
| Format: | Preprint |
| Published: |
2021
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2110.08021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Training Data Efficiency in Multimodal Process Reward Models
by: Li, Jinyuan, et al.
Published: (2026)
by: Li, Jinyuan, et al.
Published: (2026)
'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
by: Gao, Rena, et al.
Published: (2024)
by: Gao, Rena, et al.
Published: (2024)
ChemDFM-X: Towards Large Multimodal Model for Chemistry
by: Zhao, Zihan, et al.
Published: (2024)
by: Zhao, Zihan, et al.
Published: (2024)
MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
by: Shah, Siddhant Bikram, et al.
Published: (2024)
by: Shah, Siddhant Bikram, et al.
Published: (2024)
Multimodal Large Language Models for Medicine: A Comprehensive Survey
by: Ye, Jiarui, et al.
Published: (2025)
by: Ye, Jiarui, et al.
Published: (2025)
Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage
by: He, Ziyi, et al.
Published: (2026)
by: He, Ziyi, et al.
Published: (2026)
KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context
by: Lee, Nahyun, et al.
Published: (2026)
by: Lee, Nahyun, et al.
Published: (2026)
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
by: Wang, Pan, et al.
Published: (2024)
by: Wang, Pan, et al.
Published: (2024)
Multimodal Multi-loss Fusion Network for Sentiment Analysis
by: Wu, Zehui, et al.
Published: (2023)
by: Wu, Zehui, et al.
Published: (2023)
Multimodal Long Video Modeling Based on Temporal Dynamic Context
by: Hao, Haoran, et al.
Published: (2025)
by: Hao, Haoran, et al.
Published: (2025)
MUDI: A Multimodal Biomedical Dataset for Understanding Pharmacodynamic Drug-Drug Interactions
by: Ngo, Tung-Lam, et al.
Published: (2025)
by: Ngo, Tung-Lam, et al.
Published: (2025)
Doctor Sun: A Bilingual Multimodal Large Language Model for Biomedical AI
by: Xue, Dong, et al.
Published: (2025)
by: Xue, Dong, et al.
Published: (2025)
Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads
by: Zhang, Kunpeng, et al.
Published: (2026)
by: Zhang, Kunpeng, et al.
Published: (2026)
MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization
by: Saha, Anisha, et al.
Published: (2026)
by: Saha, Anisha, et al.
Published: (2026)
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
by: Wang, Qingni, et al.
Published: (2024)
by: Wang, Qingni, et al.
Published: (2024)
Multi-level Mixture of Experts for Multimodal Entity Linking
by: Hu, Zhiwei, et al.
Published: (2025)
by: Hu, Zhiwei, et al.
Published: (2025)
Continual Multimodal Knowledge Graph Construction
by: Chen, Xiang, et al.
Published: (2023)
by: Chen, Xiang, et al.
Published: (2023)
Calibrating Multimodal Consensus for Emotion Recognition
by: Zhong, Guowei, et al.
Published: (2025)
by: Zhong, Guowei, et al.
Published: (2025)
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
by: Geng, Tiantian, et al.
Published: (2024)
by: Geng, Tiantian, et al.
Published: (2024)
Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection
by: Fu, Rong, et al.
Published: (2026)
by: Fu, Rong, et al.
Published: (2026)
K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling
by: Kim, Haven, et al.
Published: (2023)
by: Kim, Haven, et al.
Published: (2023)
Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions
by: Paz-Argaman, Tzuf, et al.
Published: (2024)
by: Paz-Argaman, Tzuf, et al.
Published: (2024)
Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU
by: Mahfuz, Rehana, et al.
Published: (2026)
by: Mahfuz, Rehana, et al.
Published: (2026)
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models
by: Ren, Jie, et al.
Published: (2024)
by: Ren, Jie, et al.
Published: (2024)
Mixture of LoRA Experts
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
ModalImmune: Immunity Driven Unlearning via Self Destructive Training
by: Fu, Rong, et al.
Published: (2026)
by: Fu, Rong, et al.
Published: (2026)
Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media
by: Hebert, Liam, et al.
Published: (2023)
by: Hebert, Liam, et al.
Published: (2023)
Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts
by: Zhong, Guowei, et al.
Published: (2025)
by: Zhong, Guowei, et al.
Published: (2025)
MultiScript30k: Leveraging Multilingual Embeddings to Extend Cross Script Parallel Data
by: Driggers-Ellis, Christopher, et al.
Published: (2025)
by: Driggers-Ellis, Christopher, et al.
Published: (2025)
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
by: Wu, Qiong, et al.
Published: (2024)
by: Wu, Qiong, et al.
Published: (2024)
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
by: Henschel, Roberto, et al.
Published: (2024)
by: Henschel, Roberto, et al.
Published: (2024)
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
by: Wang, Xiao, et al.
Published: (2024)
by: Wang, Xiao, et al.
Published: (2024)
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
by: Cai, Zhongang, et al.
Published: (2025)
by: Cai, Zhongang, et al.
Published: (2025)
Automating Steering for Safe Multimodal Large Language Models
by: Wu, Lyucheng, et al.
Published: (2025)
by: Wu, Lyucheng, et al.
Published: (2025)
Unified Hallucination Detection for Multimodal Large Language Models
by: Chen, Xiang, et al.
Published: (2024)
by: Chen, Xiang, et al.
Published: (2024)
Large Language Models for Computer-Aided Design: A Survey
by: Zhang, Licheng, et al.
Published: (2025)
by: Zhang, Licheng, et al.
Published: (2025)
Bridging the Data Provenance Gap Across Text, Speech and Video
by: Longpre, Shayne, et al.
Published: (2024)
by: Longpre, Shayne, et al.
Published: (2024)
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
by: Weck, Benno, et al.
Published: (2024)
by: Weck, Benno, et al.
Published: (2024)
Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
by: Kim, Wongyu, et al.
Published: (2025)
by: Kim, Wongyu, et al.
Published: (2025)
NVLM: Open Frontier-Class Multimodal LLMs
by: Dai, Wenliang, et al.
Published: (2024)
by: Dai, Wenliang, et al.
Published: (2024)
Similar Items
-
Training Data Efficiency in Multimodal Process Reward Models
by: Li, Jinyuan, et al.
Published: (2026) -
'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
by: Gao, Rena, et al.
Published: (2024) -
ChemDFM-X: Towards Large Multimodal Model for Chemistry
by: Zhao, Zihan, et al.
Published: (2024) -
MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
by: Shah, Siddhant Bikram, et al.
Published: (2024) -
Multimodal Large Language Models for Medicine: A Comprehensive Survey
by: Ye, Jiarui, et al.
Published: (2025)