Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Koska, Ben, Horváth, Mojmír
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2411.05903
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917832647770112
author	Koska, Ben Horváth, Mojmír
author_facet	Koska, Ben Horváth, Mojmír
contents	We present a novel 4.5B parameter small language model that can handle multiple input and output modalities, including text, images, videos, and audio. Despite its small size, the model achieves near state-of-the-art performance on a variety of tasks, demonstrating the potential of multi-modal models to tackle complex real-world problems. Our approach leverages recent advancements in language modeling and multi-task learning to create a versatile and high-performing model that can even be deployed for edge inference. Experimental results show the model's strong performance across multiple benchmarks, paving the way for further progress in multi-modal artificial intelligence.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_05903
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Towards Multi-Modal Mastery: A 4.5B Parameter Truly Multi-Modal Small Language Model Koska, Ben Horváth, Mojmír Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Sound Audio and Speech Processing We present a novel 4.5B parameter small language model that can handle multiple input and output modalities, including text, images, videos, and audio. Despite its small size, the model achieves near state-of-the-art performance on a variety of tasks, demonstrating the potential of multi-modal models to tackle complex real-world problems. Our approach leverages recent advancements in language modeling and multi-task learning to create a versatile and high-performing model that can even be deployed for edge inference. Experimental results show the model's strong performance across multiple benchmarks, paving the way for further progress in multi-modal artificial intelligence.
title	Towards Multi-Modal Mastery: A 4.5B Parameter Truly Multi-Modal Small Language Model
topic	Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Sound Audio and Speech Processing
url	https://arxiv.org/abs/2411.05903

Similar Items