Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Alam, Nahid, Kanjula, Karthik Reddy, Guthikonda, Surya, Chung, Timothy, Vegesna, Bala Krishna S, Das, Abhipsha, Susevski, Anthony, Chan, Ryan Sze-Yin, Uddin, S M Iftekhar, Islam, Shayekh Bin, Santhosh, Roshan, A, Snegha, Sharma, Drishti, Liu, Chen, Chaturvedi, Isha, Winata, Genta Indra, S, Ashvanth., Mukherjee, Snehanshu, Aji, Alham Fikri
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2412.07112
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913604262952960
author	Alam, Nahid Kanjula, Karthik Reddy Guthikonda, Surya Chung, Timothy Vegesna, Bala Krishna S Das, Abhipsha Susevski, Anthony Chan, Ryan Sze-Yin Uddin, S M Iftekhar Islam, Shayekh Bin Santhosh, Roshan A, Snegha Sharma, Drishti Liu, Chen Chaturvedi, Isha Winata, Genta Indra S, Ashvanth. Mukherjee, Snehanshu Aji, Alham Fikri
author_facet	Alam, Nahid Kanjula, Karthik Reddy Guthikonda, Surya Chung, Timothy Vegesna, Bala Krishna S Das, Abhipsha Susevski, Anthony Chan, Ryan Sze-Yin Uddin, S M Iftekhar Islam, Shayekh Bin Santhosh, Roshan A, Snegha Sharma, Drishti Liu, Chen Chaturvedi, Isha Winata, Genta Indra S, Ashvanth. Mukherjee, Snehanshu Aji, Alham Fikri
contents	The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to understand low-resource languages and cultural nuances in a manner free from toxicity. To address these limitations, we introduce Maya, an open-source Multimodal Multilingual model. Our contributions are threefold: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; 2) a thorough analysis of toxicity within the LLaVA dataset, followed by the creation of a novel toxicity-free version across eight languages; and 3) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_07112
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Maya: An Instruction Finetuned Multilingual Multimodal Model Alam, Nahid Kanjula, Karthik Reddy Guthikonda, Surya Chung, Timothy Vegesna, Bala Krishna S Das, Abhipsha Susevski, Anthony Chan, Ryan Sze-Yin Uddin, S M Iftekhar Islam, Shayekh Bin Santhosh, Roshan A, Snegha Sharma, Drishti Liu, Chen Chaturvedi, Isha Winata, Genta Indra S, Ashvanth. Mukherjee, Snehanshu Aji, Alham Fikri Computer Vision and Pattern Recognition Computation and Language The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to understand low-resource languages and cultural nuances in a manner free from toxicity. To address these limitations, we introduce Maya, an open-source Multimodal Multilingual model. Our contributions are threefold: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; 2) a thorough analysis of toxicity within the LLaVA dataset, followed by the creation of a novel toxicity-free version across eight languages; and 3) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.
title	Maya: An Instruction Finetuned Multilingual Multimodal Model
topic	Computer Vision and Pattern Recognition Computation and Language
url	https://arxiv.org/abs/2412.07112

Similar Items