:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bai, Tianyi, Liang, Hao, Wan, Binwang, Xu, Yanran, Li, Xi, Li, Shiyu, Yang, Ling, Li, Bozhou, Wang, Yifan, Cui, Bin, Huang, Ping, Shan, Jiulong, He, Conghui, Yuan, Binhang, Zhang, Wentao
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Multimedia
Online Access:	https://arxiv.org/abs/2405.16640
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

KeyVideoLLM: Towards Large-scale Video Keyframe Selection
by: Liang, Hao, et al.
Published: (2024)

PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
by: Song, Shezheng, et al.
Published: (2024)

Deepfake Detection: A Comprehensive Survey from the Reliability Perspective
by: Wang, Tianyi, et al.
Published: (2022)

Ego3DT: Tracking Every 3D Object in Ego-centric Videos
by: Hao, Shengyu, et al.
Published: (2024)

How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
by: Song, Shezheng, et al.
Published: (2023)

Dynamic Multimodal Expression Generation for LLM-Driven Pedagogical Agents: From User Experience Perspective
by: Wan, Ninghao, et al.
Published: (2026)

SpecFLASH: A Latent-Guided Semi-autoregressive Speculative Decoding Framework for Efficient Multimodal Generation
by: Wang, Zihua, et al.
Published: (2025)

A Survey of Information Disorder on Video-Sharing Platforms
by: Li, Meiyu, et al.
Published: (2025)

Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck
by: Cui, Shiyao, et al.
Published: (2023)

Enhancing Multimodal Misinformation Detection by Replaying the Whole Story from Image Modality Perspective
by: Wang, Bing, et al.
Published: (2025)

A Survey of Multimodal Composite Editing and Retrieval
by: Li, Suyan, et al.
Published: (2024)

A Survey on Image-text Multimodal Models
by: Guo, Ruifeng, et al.
Published: (2023)

MIND Your Reasoning: A Meta-Cognitive Intuitive-Reflective Network for Dual-Reasoning in Multimodal Stance Detection
by: Wang, Bingbing, et al.
Published: (2025)

SACRED: A Faithful Annotated Multimedia Multimodal Multilingual Dataset for Classifying Connectedness Types in Online Spirituality
by: Guan, Qinghao, et al.
Published: (2026)

ViMo: Generating Motions from Casual Videos
by: Qiu, Liangdong, et al.
Published: (2024)

Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
by: An, Wenbin, et al.
Published: (2025)

Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
by: Yuan, Bo, et al.
Published: (2024)

THE WASTIVE: An Interactive Ebb and Flow of Digital Fabrication Waste
by: Shan, Yifan, et al.
Published: (2025)

Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors
by: Wang, Bing, et al.
Published: (2025)

Retrieval-Augmented Multimodal Model for Fake News Detection
by: Li, Yiheng, et al.
Published: (2026)

Exploring the latent space of diffusion models directly through singular value decomposition
by: Wang, Li, et al.
Published: (2025)

Multimodal Large Language Models for Medicine: A Comprehensive Survey
by: Ye, Jiarui, et al.
Published: (2025)

A Survey of Generative Categories and Techniques in Multimodal Generative Models
by: Han, Longzhen, et al.
Published: (2025)

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
by: Cheng, Zhi-Qi, et al.
Published: (2024)

Explainable Multimodal Emotion Recognition
by: Lian, Zheng, et al.
Published: (2023)

Evaluating Multimodal Large Language Models on Spoken Sarcasm Understanding
by: Li, Zhu, et al.
Published: (2025)

ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval
by: Zhao, Ruixiang, et al.
Published: (2024)

ChemDFM-X: Towards Large Multimodal Model for Chemistry
by: Zhao, Zihan, et al.
Published: (2024)

End-to-end Semantic-centric Video-based Multimodal Affective Computing
by: Lin, Ronghao, et al.
Published: (2024)

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
by: Liu, Peipei, et al.
Published: (2023)

Harmfully Manipulated Images Matter in Multimodal Misinformation Detection
by: Wang, Bing, et al.
Published: (2024)

Mutual Information-based Representations Disentanglement for Unaligned Multimodal Language Sequences
by: Qian, Fan, et al.
Published: (2024)

Efficient Object-centric Representation Learning with Pre-trained Geometric Prior
by: Khac, Phúc H. Le, et al.
Published: (2024)

The Revolution of Multimodal Large Language Models: A Survey
by: Caffagni, Davide, et al.
Published: (2024)

AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
by: Wang, Xiao, et al.
Published: (2025)

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
by: Wang, Xiao, et al.
Published: (2024)

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
by: Zhang, Qintong, et al.
Published: (2024)

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
by: Zhang, Jun, et al.
Published: (2025)

Listening to the Unspoken: Exploring "365" Aspects of Multimodal Interview Performance Assessment
by: Li, Jia, et al.
Published: (2025)

MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
by: Zhang, Hanlei, et al.
Published: (2024)