Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Singh, Rajat, Choudhary, Nurendra, Shrivastava, Manish
Format:	Preprint
Published:	2018
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/1804.00804
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909130641375232
author	Singh, Rajat Choudhary, Nurendra Shrivastava, Manish
author_facet	Singh, Rajat Choudhary, Nurendra Shrivastava, Manish
contents	Social media platforms such as Twitter and Facebook are becoming popular in multilingual societies. This trend induces portmanteau of South Asian languages with English. The blend of multiple languages as code-mixed data has recently become popular in research communities for various NLP tasks. Code-mixed data consist of anomalies such as grammatical errors and spelling variations. In this paper, we leverage the contextual property of words where the different spelling variation of words share similar context in a large noisy social media text. We capture different variations of words belonging to same context in an unsupervised manner using distributed representations of words. Our experiments reveal that preprocessing of the code-mixed dataset based on our approach improves the performance in state-of-the-art part-of-speech tagging (POS-tagging) and sentiment analysis tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_1804_00804
institution	arXiv
publishDate	2018
record_format	arxiv
spellingShingle	Automatic Normalization of Word Variations in Code-Mixed Social Media Text Singh, Rajat Choudhary, Nurendra Shrivastava, Manish Computation and Language Social media platforms such as Twitter and Facebook are becoming popular in multilingual societies. This trend induces portmanteau of South Asian languages with English. The blend of multiple languages as code-mixed data has recently become popular in research communities for various NLP tasks. Code-mixed data consist of anomalies such as grammatical errors and spelling variations. In this paper, we leverage the contextual property of words where the different spelling variation of words share similar context in a large noisy social media text. We capture different variations of words belonging to same context in an unsupervised manner using distributed representations of words. Our experiments reveal that preprocessing of the code-mixed dataset based on our approach improves the performance in state-of-the-art part-of-speech tagging (POS-tagging) and sentiment analysis tasks.
title	Automatic Normalization of Word Variations in Code-Mixed Social Media Text
topic	Computation and Language
url	https://arxiv.org/abs/1804.00804

Similar Items