Big Data Analytics: Educational Data Classification Using Hadoop-Inspired MapReduce Framework

Pratiyush Guleria, Manu Sood

Source Title: Predictive Intelligence Using Big Data and the Internet of Things

DOI: 10.4018/978-1-5225-6210-8.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Due to an increase in the number of digital transactions and data sources, a huge amount of unstructured data is generated by every interaction. In such a scenario, the concepts of data mining assume great significance as useful information/trends/predictions can be retrieved from this large amount of data, known as big data. Big data predictive analytics are making big inroads into the educational field because with the adoption of new technologies, new academic trends are being introduced into educational systems. This accumulation of large data of different varieties throws a new set of challenges to the learners as well as educational institutions in ensuring the quality of their education by improving strategic/operational decision-making capabilities. Therefore, the authors address this issue by proposing a support system that can guide the student to choose and to focus on the right course(s) based on their personal preferences. This chapter provides the readers with the requisite information about educational frameworks and related data mining.

Chapter Preview

Top

Introduction

Educational data mining is one of the most promising areas for getting new insight into the trends and predictions in our educational systems supporting continuous integration of newer technologies and corresponding transformations. The inclusion of various modes of e-learning and other online educational resources into the teacher-taught paradigm, in the formal as well as informal education sectors, results into a collection of huge volumes of data. For this structured, semi-structured or unstructured data to make reasonable sense to the stakeholders of the systems, the emerging trends of data mining need to be explored for processing this data available on distributed systems with parallel computations.

With the adoption of these new mining techniques, educational sector is the beneficiary because of faster decision-making with a support from analyses of data fetched from students. The data from students and other stakeholders may include: a) preferences for the courses, course outcomes, trainings especially vocational trainings, industry, industry oriented courses as optional subjects, job profiles, etc.; b) choices of the appropriate existing subjects; c) available options at the national and international levels, and; d) in-house training needs for the employees and management and so on.

Big data is the emerging field that uses data mining and provides answers to resolve the problems arising due to accumulation of large amount of data obtained from academia. In big data, mining techniques process the data in the form of small chunks and distribute it on multiple machines for processing and finally aggregate it to present the results. Big data perform these operations with the help of its programming paradigm i.e. Hadoop. It is a framework for distributed processing of datasets so large and/or complex where traditional data processing applications are incompetent to deal with them. Big data is the term for collection of datasets which are large, complex and becomes difficult to process using on-hand database management tools. Big data includes gathering of data for storage and analysis purpose which gain control over operations like searching, sharing, visualization of data, query processing, updation and maintain privacy of information. In Big data, there is extremely large dataset that is analyzed computationally to reveal patterns, trends and associations. It deals with unstructured data which may include Microsoft Office files, PDF, Text etc whereas structured data may be the relational data. Hadoop is one technique of big data and answer to problems related to handling of unstructured and massive data.

In Educational sector, data mining is the area for getting new insight into educational system where with increase in technologies as well with transformation of class room teaching to online learning and other educational resources, it results into collection of huge volume of data which may be structured, semi-structured or completely unstructured. With data-processing and decision driven technologies like “Big Data”, educational sector will be the beneficiary as decision-making will be faster with analyses of data fetched from students in terms of their feedback for the courses, syllabi curriculum in preference for industry oriented courses as optional subjects, in-house trainings for the staff members etc.

Educational System Design

With the rapid development of technologies (Begona Gros, 2016), flexible and efficient learning methods for learners are being developed. The students usually acquire basic knowledge and core skills in the classroom. Learning goals and processes always are the same for each student in traditional classroom. But students (Tomlinson & McTighe, 2006) with different backgrounds have different needs. The interactions in the classrooms should therefore, be differentiable and responsive enough to accommodate the variations according to the learners readiness levels, interests and learning profiles (Tomlinson & Kalbfleisch, 1998). In a traditional classroom, the teacher is the main source of information and students are required to stay in the same place and participate simultaneously in the same set of activities, whereas in a situation of ubiquitous learning, activities can be conducted in a different space and time for each student. In addition, integrated teaching aids are also available to them all the time and are accessible from any device (Begona Gros, 2016). The paradigm shift in educational system from traditional classroom teaching to smart learning environment is shown in Figure 1(a) and 1(b).

Figure 1.

(a), (b) Educational system design

Key Terms in this Chapter

YARN: Apache Hadoop YARN is known as Yet another resource negotiator, which is a cluster management technology. YARN acts as a resource manager that coordinates the resources for the applications using the Hadoop resources and monitors the operations on cluster nodes using TaskTraker and regularly communicates to Job Tracker which is the Master Node.

HDFS: HDFS means Hadoop distributed file system, which provides the extensible, fault-tolerant, and cost-efficient storage for metadata.

SPSS: SPSS means statistical package for the social sciences used for statistical analysis acquired by IBM in 2009.

Clustering: Clustering is the task of grouping a set of objects in such a way that similarity of objects in the same group are compared to another group and discover that object in which group are more similar to each other than to those in other groups.

KEEL: KEEL is a data mining and learning analytics tools. KEEL means knowledge extraction based on evolutionary learning.

Classification: A classification is allocation, categorization, and analysis of data according to its similarities.

Analytics: Analytics is the process of statistics in which the data revelation, perception, and communication of meaningful patterns in data is obtained.

Weka: WEKA means Waikato environment for knowledge analysis. It is a machine learning software written in Java, developed at the University of Waikato, New Zealand.

MOOCs: Massive open online courses are free online courses available for anyone to enroll.

Correlation: Correlation is the statistical analysis technique, which is used to compute the organization between two continuous variables which can be between an independent and a dependent variable or between two independent variables.

Regression: Regression is the statistical analysis technique used to examine the relationship between one dependent and one independent variable. The regression performs the prediction of the dependent variable when the independent variable is known.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference