Sentiment Analysis of Multilingual Tweets Based on Natural Language Processing (NLP)

Sentiment Analysis of Multilingual Tweets Based on Natural Language Processing (NLP)

Abhijit Bera, Mrinal Kanti Ghose, Dibyendu Kumar Pal
Copyright: © 2021 |Pages: 12
DOI: 10.4018/IJSDA.20211001.oa16
Article PDF Download
Open access articles are freely available for download

Abstract

Multilingual Sentiment analysis plays an important role in a country like India with many languages as the style of expression varies in different languages. The Indian people speak in total 22 different languages and with the help of Google Indic keyboard people can express their sentiments i.e reviews about anything in the social media in their native language from individual smart phones. It has been found that machine learning approach has overcome the limitations of other approaches. In this paper, a detailed study has been carried out based on Natural Language Processing (NLP) using Simple Neural Network (SNN) ,Convolutional Neural Network(CNN), and Long Short Term Memory (LSTM)Neural Network followed by another amalgamated model adding a CNN layer on top of the LSTM without worrying about versatility of multilingualism. Around 4000 samples of reviews in English, Hindi and in Bengali languages are considered to generate outputs for the above models and analyzed. The experimental results on these realistic reviews are found to be effective for further research work.
Article Preview
Top

1. Introduction

Sentiment Analysis is a Natural Language Processing and Information Extraction task that aims to evaluate writers’ feelings expressed in positive or negative comments, questions and requests, by analyzing a large number of documents. Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall tonality of a document.

A large variety of machine learning models that perform NLP applications in different ways are available in the literature. Recently, machine learning especially deep learning approaches have obtained very high accuracy across many other different NLP tasks. These models can often be trained with a single end-to-end model and do not require traditional feature-specific classification. Though Neural networks have been extensively used in diverse field of research such as Process modeling and control (Sana Bouzaida & Anis Sakly, 2018), Medical Diagnosis (Majhi, 2018), Targeted Marketing (Milton et.al, 2019), Intelligent searching (Olga Tikhomirova, 2020) and various other applications, these have also attracted many researchers to obtain superior results on various language-related tasks as compared to traditional machine learning models like SVM or logistic regression or Multinomial Naïve Bayes models (Sarkar & Bhowmick, 2017). (Hasan et. al. 2018) has presented a hybrid approach of sentiment analyzer by applying supervised machine-learning algorithms such as Naïve Bayes and support vector machines (SVM).

Indian native languages are being scarce resourced language, proper datasets and sentiment lexicons are not developed enough. Thus sentiment analysis of multilingual tweets or blogs in Indian scenario is very difficult with traditional lexical analysis of supervised models of sentiment polarity analysis. India has a population of 130 cores speaking in about 22 different languages. As per 2011 census in urban India, it has been reported that already 295 million out of 455 million people are using the internet and from the rural areas of India, about 186 million out of 918 million are internet users. India is the second largest online market after China. Accordingly, the reviews of Indian people in their respective language in social media like Face book, Twitter, YouTube, Amazon, Flip kart etc. bear a very important consequence over international promote. Analyzing the polarity of these multilingual reviews in machine learning method has been proven easier than traditional lexical analysis models as it is independent of the grammar of different languages.

Much of the research work on sentiment analysis has been carried out in the English language, but work in native or regional languages has yet not been explored to a great extent. In this paper, an attempt has been made to study the features of different Neural Network algorithms and to analyze their outcomes in achieving best accuracy in minimum time. The datasets from CoRR (Hassan et al., 2016) and also from Twitter API v1.1 for three popular Indian languages, English Bengali and Hindi have been combined. A Simple Neural Network (SNN) model has been designed and tested on the English, Hindi and Bengali datasets of the reviews of Amazon, IMDB Movie, Restaurants and Cricket matches. The same dataset has been further tested for a second model created by Convolutional Neural Network(CNN) and the same dataset has been tested for a third model created by Long Short Term Memory (LSTM)Neural Network and subsequently for the fourth model of LSTM followed by CNN layers. A comparative study of the outputs of the various models used is presented. And the result shows that LSTM followed by CNN achieves better accuracy performance and minimum time complexity in twitter Sentiment Classification than some of traditional method such as the SVM and Naive Bayes methods.

This paper provides a description of related work on multilingual text analysis and details the methodology and comparison of SNN, CNN and LSTM. A later part of the paper explains background discussion about application of Convolutional Neural Network in NLP and also Recurrent Neural Network with help of Long Short term Memory model. The methodology used is depicted by algorithms and the results from different models with around 4000 samples of tweet texts in English, Hindi and in Bengali languages and different size of training batches are furnished.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 11: 5 Issues (2022)
Volume 10: 4 Issues (2021)
Volume 9: 4 Issues (2020)
Volume 8: 4 Issues (2019)
Volume 7: 4 Issues (2018)
Volume 6: 4 Issues (2017)
Volume 5: 4 Issues (2016)
Volume 4: 4 Issues (2015)
Volume 3: 4 Issues (2014)
Volume 2: 4 Issues (2013)
Volume 1: 4 Issues (2012)
View Complete Journal Contents Listing