Classification and Recommendation With Data Streams

Classification and Recommendation With Data Streams

Bruno Veloso, João Gama, Benedita Malheiro
Copyright: © 2021 |Pages: 10
DOI: 10.4018/978-1-7998-3479-3.ch047
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Nowadays, with the exponential growth of data stream sources (e.g., Internet of Things [IoT], social networks, crowdsourcing platforms, and personal mobile devices), data stream processing has become indispensable for online classification, recommendation, and evaluation. Its main goal is to maintain dynamic models updated, holding the captured patterns, to make accurate predictions. The foundations of data streams algorithms are incremental processing, in order to reduce the computational resources required to process large quantities of data, and relevance model updating. This article addresses data stream knowledge processing, covering classification, recommendation, and evaluation; describing existing algorithms/techniques; and identifying open challenges.
Chapter Preview
Top

Background

Data stream processing typically includes classification, recommendation and evaluation. Data stream classifiers work incrementally and attempt to classify events as they occur, i.e., in near real time and not a posteriori. Micro-clusters (Aggarwal, Han, Wang, & Yu, 2004), decision trees (Rutkowski, Pietruczuk, Duda, & Jaworski, 2013), ensemble classifiers (Osojnik, Panov, & Dzeroski, 2017) and adaptive model rules (Duarte, Gama, & Bifet, 2016) are well-known classification techniques which have been adapted to work with data streams. However, they suffer from concept drift, data outliers and missing data. Concept drifts can occur with time, e.g., when interests change, and may arise from different situations, e.g., changes in the properties of the data or inappropriate hyper-parameter tuning, are identifiable by alterations in the statistical properties of the data. In such cases, past observations become irrelevant to the current state and the algorithm needs to forget to improve its accuracy. Outliers correspond to data spatially distant from the trend of most of the observations. Outlier observations can be considered as hidden variables, inducing large statistical errors, and may indicate experimental error or unexpected variability in the measurement or noise. Missing values result from sensor malfunctioning or communication problems.

Key Terms in this Chapter

Hyper-Parameter: It is used to describe the parameters that an algorithm/model requires to work.

Evaluation: Is the task of judge and assessing if the predictions generated by a classification or recommendation algorithm scores on the target value.

Classification: Is the task that assigns items to a set of classes.

Data Mining: Is the process that identifies and collects patterns in data sets.

Automatic Machine Learning: Is the automation of complex data mining tasks for example hyper-parameter optimization.

Data Streams: is a continuous flow of data to be processed by an algorithm.

Recommendation: Is an algorithm that tries to identify patterns and make predictions based on the past user behaviour.

Complete Chapter List

Search this Book:
Reset