Performance Analysis of Machine Learning Algorithms for Big Data Classification: ML and AI-Based Algorithms for Big Data Analysis

Performance Analysis of Machine Learning Algorithms for Big Data Classification: ML and AI-Based Algorithms for Big Data Analysis

Sanjeev Kumar Punia, Manoj Kumar, Thompson Stephan, Ganesh Gopal Deverajan, Rizwan Patan
Copyright: © 2021 |Pages: 16
DOI: 10.4018/IJEHMC.20210701.oa4
Article PDF Download
Open access articles are freely available for download

Abstract

In broad, three machine learning classification algorithms are used to discover correlations, hidden patterns, and other useful information from different data sets known as big data. Today, Twitter, Facebook, Instagram, and many other social media networks are used to collect the unstructured data. The conversion of unstructured data into structured data or meaningful information is a very tedious task. The different machine learning classification algorithms are used to convert unstructured data into structured data. In this paper, the authors first collect the unstructured research data from a frequently used social media network (i.e., Twitter) by using a Twitter application program interface (API) stream. Secondly, they implement different machine classification algorithms (supervised, unsupervised, and reinforcement) like decision trees (DT), neural networks (NN), support vector machines (SVM), naive Bayes (NB), linear regression (LR), and k-nearest neighbor (K-NN) from the collected research data set. The comparison of different machine learning classification algorithms is concluded.
Article Preview
Top

1. Introduction

In the current digital era, data is growing exponentially. The amount of this growing data known as Big Data is the beginning of the human life revolution in many fields. In general, the five main characteristics of Big Data are (i) volume (ii) variety (iii) velocity (iv) veracity and (v) value. The combination of these five characteristics is called 5 Vs. and is represented in Figure 1, Where "volume" represents the collection of all generated data sets. The "variety" indicates the different formats of data from various sources. The "velocity" shows the high speed of accumulation of data in the data set. The "veracity" represents data accuracy or trustworthiness in the generated data set. The "value" represents all types of attributes in the generated data set. Big data analysis is growing rapidly in every field/industry. In medical science, big data analysis is used to prevent and cure different diseases like cancer. Big data analyses benefit hospitals by providing better patients satisfaction. In the field of agriculture, the analysis of big data helps to increase agriculture product value. In space-related research, big data analysis provides many opportunities in exploring different researches. In pattern recognition, big data analyses play a vital role during remote sensing. The use of big data has already given rise to several questions, including those of how data can be collected and used in ethical and socially sensitive ways.

Figure 1.

Big Data five v's

IJEHMC.20210701.oa4.f01

In this paper, section II describes different classification techniques. Section III represents a classification literature survey. Section IV displays the experimental setup. Section V shows the result analysis and section VI concludes the paper with its limitation.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 5 Issues (2022): 4 Released, 1 Forthcoming
Volume 12: 6 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing