Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.05903 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917832647770112 |
|---|---|
| author | Koska, Ben Horváth, Mojmír |
| author_facet | Koska, Ben Horváth, Mojmír |
| contents | We present a novel 4.5B parameter small language model that can handle multiple input and output modalities, including text, images, videos, and audio. Despite its small size, the model achieves near state-of-the-art performance on a variety of tasks, demonstrating the potential of multi-modal models to tackle complex real-world problems. Our approach leverages recent advancements in language modeling and multi-task learning to create a versatile and high-performing model that can even be deployed for edge inference. Experimental results show the model's strong performance across multiple benchmarks, paving the way for further progress in multi-modal artificial intelligence. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2411_05903 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Towards Multi-Modal Mastery: A 4.5B Parameter Truly Multi-Modal Small Language Model Koska, Ben Horváth, Mojmír Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Sound Audio and Speech Processing We present a novel 4.5B parameter small language model that can handle multiple input and output modalities, including text, images, videos, and audio. Despite its small size, the model achieves near state-of-the-art performance on a variety of tasks, demonstrating the potential of multi-modal models to tackle complex real-world problems. Our approach leverages recent advancements in language modeling and multi-task learning to create a versatile and high-performing model that can even be deployed for edge inference. Experimental results show the model's strong performance across multiple benchmarks, paving the way for further progress in multi-modal artificial intelligence. |
| title | Towards Multi-Modal Mastery: A 4.5B Parameter Truly Multi-Modal Small Language Model |
| topic | Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Sound Audio and Speech Processing |
| url | https://arxiv.org/abs/2411.05903 |