Saved in:
| Main Authors: | Bai, Tianyi, Liang, Hao, Wan, Binwang, Xu, Yanran, Li, Xi, Li, Shiyu, Yang, Ling, Li, Bozhou, Wang, Yifan, Cui, Bin, Huang, Ping, Shan, Jiulong, He, Conghui, Yuan, Binhang, Zhang, Wentao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.16640 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
by: Song, Shezheng, et al.
Published: (2024)
by: Song, Shezheng, et al.
Published: (2024)
Deepfake Detection: A Comprehensive Survey from the Reliability Perspective
by: Wang, Tianyi, et al.
Published: (2022)
by: Wang, Tianyi, et al.
Published: (2022)
Ego3DT: Tracking Every 3D Object in Ego-centric Videos
by: Hao, Shengyu, et al.
Published: (2024)
by: Hao, Shengyu, et al.
Published: (2024)
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
by: Song, Shezheng, et al.
Published: (2023)
by: Song, Shezheng, et al.
Published: (2023)
Dynamic Multimodal Expression Generation for LLM-Driven Pedagogical Agents: From User Experience Perspective
by: Wan, Ninghao, et al.
Published: (2026)
by: Wan, Ninghao, et al.
Published: (2026)
SpecFLASH: A Latent-Guided Semi-autoregressive Speculative Decoding Framework for Efficient Multimodal Generation
by: Wang, Zihua, et al.
Published: (2025)
by: Wang, Zihua, et al.
Published: (2025)
A Survey of Information Disorder on Video-Sharing Platforms
by: Li, Meiyu, et al.
Published: (2025)
by: Li, Meiyu, et al.
Published: (2025)
Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck
by: Cui, Shiyao, et al.
Published: (2023)
by: Cui, Shiyao, et al.
Published: (2023)
Enhancing Multimodal Misinformation Detection by Replaying the Whole Story from Image Modality Perspective
by: Wang, Bing, et al.
Published: (2025)
by: Wang, Bing, et al.
Published: (2025)
A Survey of Multimodal Composite Editing and Retrieval
by: Li, Suyan, et al.
Published: (2024)
by: Li, Suyan, et al.
Published: (2024)
A Survey on Image-text Multimodal Models
by: Guo, Ruifeng, et al.
Published: (2023)
by: Guo, Ruifeng, et al.
Published: (2023)
MIND Your Reasoning: A Meta-Cognitive Intuitive-Reflective Network for Dual-Reasoning in Multimodal Stance Detection
by: Wang, Bingbing, et al.
Published: (2025)
by: Wang, Bingbing, et al.
Published: (2025)
SACRED: A Faithful Annotated Multimedia Multimodal Multilingual Dataset for Classifying Connectedness Types in Online Spirituality
by: Guan, Qinghao, et al.
Published: (2026)
by: Guan, Qinghao, et al.
Published: (2026)
ViMo: Generating Motions from Casual Videos
by: Qiu, Liangdong, et al.
Published: (2024)
by: Qiu, Liangdong, et al.
Published: (2024)
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
by: An, Wenbin, et al.
Published: (2025)
by: An, Wenbin, et al.
Published: (2025)
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
by: Yuan, Bo, et al.
Published: (2024)
by: Yuan, Bo, et al.
Published: (2024)
THE WASTIVE: An Interactive Ebb and Flow of Digital Fabrication Waste
by: Shan, Yifan, et al.
Published: (2025)
by: Shan, Yifan, et al.
Published: (2025)
Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors
by: Wang, Bing, et al.
Published: (2025)
by: Wang, Bing, et al.
Published: (2025)
Retrieval-Augmented Multimodal Model for Fake News Detection
by: Li, Yiheng, et al.
Published: (2026)
by: Li, Yiheng, et al.
Published: (2026)
Exploring the latent space of diffusion models directly through singular value decomposition
by: Wang, Li, et al.
Published: (2025)
by: Wang, Li, et al.
Published: (2025)
Multimodal Large Language Models for Medicine: A Comprehensive Survey
by: Ye, Jiarui, et al.
Published: (2025)
by: Ye, Jiarui, et al.
Published: (2025)
A Survey of Generative Categories and Techniques in Multimodal Generative Models
by: Han, Longzhen, et al.
Published: (2025)
by: Han, Longzhen, et al.
Published: (2025)
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
by: Cheng, Zhi-Qi, et al.
Published: (2024)
by: Cheng, Zhi-Qi, et al.
Published: (2024)
Explainable Multimodal Emotion Recognition
by: Lian, Zheng, et al.
Published: (2023)
by: Lian, Zheng, et al.
Published: (2023)
Evaluating Multimodal Large Language Models on Spoken Sarcasm Understanding
by: Li, Zhu, et al.
Published: (2025)
by: Li, Zhu, et al.
Published: (2025)
ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval
by: Zhao, Ruixiang, et al.
Published: (2024)
by: Zhao, Ruixiang, et al.
Published: (2024)
ChemDFM-X: Towards Large Multimodal Model for Chemistry
by: Zhao, Zihan, et al.
Published: (2024)
by: Zhao, Zihan, et al.
Published: (2024)
End-to-end Semantic-centric Video-based Multimodal Affective Computing
by: Lin, Ronghao, et al.
Published: (2024)
by: Lin, Ronghao, et al.
Published: (2024)
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
by: Liu, Peipei, et al.
Published: (2023)
by: Liu, Peipei, et al.
Published: (2023)
Harmfully Manipulated Images Matter in Multimodal Misinformation Detection
by: Wang, Bing, et al.
Published: (2024)
by: Wang, Bing, et al.
Published: (2024)
Mutual Information-based Representations Disentanglement for Unaligned Multimodal Language Sequences
by: Qian, Fan, et al.
Published: (2024)
by: Qian, Fan, et al.
Published: (2024)
Efficient Object-centric Representation Learning with Pre-trained Geometric Prior
by: Khac, Phúc H. Le, et al.
Published: (2024)
by: Khac, Phúc H. Le, et al.
Published: (2024)
The Revolution of Multimodal Large Language Models: A Survey
by: Caffagni, Davide, et al.
Published: (2024)
by: Caffagni, Davide, et al.
Published: (2024)
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
by: Wang, Xiao, et al.
Published: (2024)
by: Wang, Xiao, et al.
Published: (2024)
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
by: Zhang, Qintong, et al.
Published: (2024)
by: Zhang, Qintong, et al.
Published: (2024)
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
by: Zhang, Jun, et al.
Published: (2025)
by: Zhang, Jun, et al.
Published: (2025)
Listening to the Unspoken: Exploring "365" Aspects of Multimodal Interview Performance Assessment
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
by: Zhang, Hanlei, et al.
Published: (2024)
by: Zhang, Hanlei, et al.
Published: (2024)
Similar Items
-
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
by: Liang, Hao, et al.
Published: (2024) -
PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
by: Song, Shezheng, et al.
Published: (2024) -
Deepfake Detection: A Comprehensive Survey from the Reliability Perspective
by: Wang, Tianyi, et al.
Published: (2022) -
Ego3DT: Tracking Every 3D Object in Ego-centric Videos
by: Hao, Shengyu, et al.
Published: (2024) -
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
by: Song, Shezheng, et al.
Published: (2023)