Saved in:
Bibliographic Details
Main Authors: Koska, Ben, Horváth, Mojmír
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2411.05903
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We present a novel 4.5B parameter small language model that can handle multiple input and output modalities, including text, images, videos, and audio. Despite its small size, the model achieves near state-of-the-art performance on a variety of tasks, demonstrating the potential of multi-modal models to tackle complex real-world problems. Our approach leverages recent advancements in language modeling and multi-task learning to create a versatile and high-performing model that can even be deployed for edge inference. Experimental results show the model's strong performance across multiple benchmarks, paving the way for further progress in multi-modal artificial intelligence.