Saved in:
Bibliographic Details
Main Authors: Reka Team, Ormazabal, Aitor, Zheng, Che, d'Autume, Cyprien de Masson, Yogatama, Dani, Fu, Deyu, Ong, Donovan, Chen, Eric, Lamprecht, Eugenie, Pham, Hai, Ong, Isaac, Aleksiev, Kaloyan, Li, Lei, Henderson, Matthew, Bain, Max, Artetxe, Mikel, Relan, Nishant, Padlewski, Piotr, Liu, Qi, Chen, Ren, Phua, Samuel, Yang, Yazheng, Tay, Yi, Wang, Yuqi, Zhu, Zhongkai, Xie, Zhihui
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.12387
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916238066712576
author Reka Team
Ormazabal, Aitor
Zheng, Che
d'Autume, Cyprien de Masson
Yogatama, Dani
Fu, Deyu
Ong, Donovan
Chen, Eric
Lamprecht, Eugenie
Pham, Hai
Ong, Isaac
Aleksiev, Kaloyan
Li, Lei
Henderson, Matthew
Bain, Max
Artetxe, Mikel
Relan, Nishant
Padlewski, Piotr
Liu, Qi
Chen, Ren
Phua, Samuel
Yang, Yazheng
Tay, Yi
Wang, Yuqi
Zhu, Zhongkai
Xie, Zhihui
author_facet Reka Team
Ormazabal, Aitor
Zheng, Che
d'Autume, Cyprien de Masson
Yogatama, Dani
Fu, Deyu
Ong, Donovan
Chen, Eric
Lamprecht, Eugenie
Pham, Hai
Ong, Isaac
Aleksiev, Kaloyan
Li, Lei
Henderson, Matthew
Bain, Max
Artetxe, Mikel
Relan, Nishant
Padlewski, Piotr
Liu, Qi
Chen, Ren
Phua, Samuel
Yang, Yazheng
Tay, Yi
Wang, Yuqi
Zhu, Zhongkai
Xie, Zhihui
contents We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai .
format Preprint
id arxiv_https___arxiv_org_abs_2404_12387
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Reka Team
Ormazabal, Aitor
Zheng, Che
d'Autume, Cyprien de Masson
Yogatama, Dani
Fu, Deyu
Ong, Donovan
Chen, Eric
Lamprecht, Eugenie
Pham, Hai
Ong, Isaac
Aleksiev, Kaloyan
Li, Lei
Henderson, Matthew
Bain, Max
Artetxe, Mikel
Relan, Nishant
Padlewski, Piotr
Liu, Qi
Chen, Ren
Phua, Samuel
Yang, Yazheng
Tay, Yi
Wang, Yuqi
Zhu, Zhongkai
Xie, Zhihui
Computation and Language
Computer Vision and Pattern Recognition
We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai .
title Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
topic Computation and Language
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2404.12387