Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.01319 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916344548556800 |
|---|---|
| author | Wang, Jiaqi Jiang, Hanqi Liu, Yiheng Ma, Chong Zhang, Xu Pan, Yi Liu, Mengyuan Gu, Peiran Xia, Sichen Li, Wenjun Zhang, Yutong Wu, Zihao Liu, Zhengliang Zhong, Tianyang Ge, Bao Zhang, Tuo Qiang, Ning Hu, Xintao Jiang, Xi Zhang, Xin Zhang, Wei Shen, Dinggang Liu, Tianming Zhang, Shu |
| author_facet | Wang, Jiaqi Jiang, Hanqi Liu, Yiheng Ma, Chong Zhang, Xu Pan, Yi Liu, Mengyuan Gu, Peiran Xia, Sichen Li, Wenjun Zhang, Yutong Wu, Zihao Liu, Zhengliang Zhong, Tianyang Ge, Bao Zhang, Tuo Qiang, Ning Hu, Xintao Jiang, Xi Zhang, Xin Zhang, Wei Shen, Dinggang Liu, Tianming Zhang, Shu |
| contents | In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of single-modality systems. In this paper, we systematically sort out the applications of MLLM in multimodal tasks such as natural language, vision, and audio. We also provide a comparative analysis of the focus of different MLLMs in the tasks, and provide insights into the shortcomings of current MLLMs, and suggest potential directions for future research. Through these discussions, this paper hopes to provide valuable insights for the further development and application of MLLM. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2408_01319 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks Wang, Jiaqi Jiang, Hanqi Liu, Yiheng Ma, Chong Zhang, Xu Pan, Yi Liu, Mengyuan Gu, Peiran Xia, Sichen Li, Wenjun Zhang, Yutong Wu, Zihao Liu, Zhengliang Zhong, Tianyang Ge, Bao Zhang, Tuo Qiang, Ning Hu, Xintao Jiang, Xi Zhang, Xin Zhang, Wei Shen, Dinggang Liu, Tianming Zhang, Shu Artificial Intelligence In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of single-modality systems. In this paper, we systematically sort out the applications of MLLM in multimodal tasks such as natural language, vision, and audio. We also provide a comparative analysis of the focus of different MLLMs in the tasks, and provide insights into the shortcomings of current MLLMs, and suggest potential directions for future research. Through these discussions, this paper hopes to provide valuable insights for the further development and application of MLLM. |
| title | A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2408.01319 |