Fusion of XLNet and BiLSTM-TextCNN for Weibo Sentiment Analysis in Spark Big Data Environment

Fusion of XLNet and BiLSTM-TextCNN for Weibo Sentiment Analysis in Spark Big Data Environment

Aichuan Li, Tian Li
Copyright: © 2023 |Pages: 18
DOI: 10.4018/IJACI.331744
Article PDF Download
Open access articles are freely available for download

Abstract

This article proposes a Weibo sentiment analysis method to improve traditional algorithms' analysis efficiency and accuracy. The proposed algorithm uses deep learning in the Spark big data environment. First, the input data are converted into dynamic word vector representations using the Chinese version of the XLNet model. Then, dual-channel feature extraction is performed on the data using TextCNN and BiLSTM. The proposed algorithm uses an attention mechanism to allocate computing resources efficiently and realizes feature fusion and data classification. Comparative experiments are conducted on two public datasets under identical experimental conditions. In the NLPCC2014 and NLPCC2015 datasets, the proposed model improves the precision and F1 metrics by at least 4.26% and 2.64%, respectively. In the weibo_senti_100k dataset, the proposed model improves the precision and F1 metrics by at least 4.66% and 2.69%, respectively. The results indicate that the proposed method has better sentiment analysis and prediction abilities than existing methods.
Article Preview
Top

Introduction

With the rise of social media, microblogging has become a popular platform, drawing the attention and participation of many users. Users are free to express their opinions and share news and life moments on microblogs, which leads to a large amount of information emerging on such sites. This surge in information volume presents a significant hurdle regarding people’s ability to efficiently access and process information (Jia & Han, 2020; Chen et al., 2023).

Information overload has become a common problem for microblog users. When faced with a large amount of microblog information, users often find it difficult to filter and understand the content quickly and effectively (Banik et al., 2023). This requires the use of data mining techniques to help users extract valuable information from massive microblog data. Data mining techniques can help users discover topics of interest, people of concern, and popular events by analyzing patterns and association rules in microblogging data, thus reducing the pressure caused by information overload (Dina et al., 2021; Pham et al., 2021).

Based on data mining, sentiment analysis techniques can help users understand microblogs’ emotional tendencies and attitudes. Since microblogging provides a platform for users to express themselves freely, it contains rich emotional information. Sentiment analysis can be used to categorize microblog texts into positive, negative, or neutral sentiment categories through natural language processing and machine learning techniques. The results of such sentiment analysis can help users better understand the emotional information conveyed in microblogs. Consequently, users are empowered to gain more precise insight into the attitudes and emotions expressed by fellow users. Al-Shabi (2020) combined supervised machine learning and lexical knowledge methods for linguistic sentiment categorization. However, this approach requires substantial computational time and is unsuitable for handling complex tasks. Kumar et al. (2019) classified comment sentiment by evaluating hybrid features obtained by combining machine learning features with dictionary features. However, this approach is limited by the accuracy of the a priori parameters of the feature weights. Meanwhile, improvements in computing power have paved the way for the widespread adoption of distributed computing frameworks such as Spark. This trend has also provided efficient and reliable computational methods for sentiment analysis (Chebbi et al., 2018; Farhan et al., 2018), bringing significant impetus to developing the sentiment analysis field.

To address the limitations of traditional sentiment analysis methods, this paper proposes a text sentiment classification method with the Spark framework for Weibo sentiment analysis under abundant datasets. In contrast to traditional deep learning-based text sentiment analysis methods, the innovations of the proposed method lie in the following:

  • 1.

    To thoroughly learn the semantic information contained in the text data of microblogs, this paper performs feature fusion operations with TextCNN and BiLSTM. This approach can not only effectively enhance the extraction efficiency of local and global features in text semantic information but can also effectively handle text information of various lengths. Hence, the feature fusion approach described in this paper addresses the inability of traditional text classification methods to handle text data of different lengths. The dropout regular function included in the model effectively solves the problem of model overfitting.

  • 2.

    To highlight the emotional tendencies embedded in the microblog text data, the attention mechanism is selected to assign weights to different sentence features, effectively improving the efficiency of emotional tendency recognition.

  • 3.

    Considering the limitations of single-machine serial in big data processing, this paper deploys the sentiment classification algorithm based on hybrid learning in the Spark platform. This memory-based iterative computing framework effectively improves computing efficiency while exhibiting superior versatility and fusibility.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing