Hybrid Approach Using Deep Autoencoder and Machine Learning Techniques for Cyber-Attack Detection

Hybrid Approach Using Deep Autoencoder and Machine Learning Techniques for Cyber-Attack Detection

Vikash Kumar, Ditipriya Sinha
Copyright: © 2022 |Pages: 21
DOI: 10.4018/IJACI.293098
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The feature reduction from the vast amount of data collected from the Internet is challenging and labor-intensive. Data imbalance is another problem in decision-making analysis that leads to a biased model favoring classes with larger samples. This paper proposes a hybrid model using autoencoder and machine learning models. It deals with feature reduction and handles imbalance attack classes using SMOTE method to balance the dataset, and then AE is trained. The bottleneck code of AE is stacked with different classifiers on datasets such as NSL-KDD, UNSW-NB15 and BoT-IoT to evaluate the proposed method. The performance of the proposed approach shows improvement over attack detection without AE. The most noticeable change occurred for SVM on the NSL-KDD dataset that shows doubled improvement of accuracy. In the case of UNSW-NB15, the results vary and see an improvement for the LR model. The BoT-IoT dataset sees the lowest performance variation, i.e., 0%-6%.
Article Preview
Top

1. Introduction

The world is moving towards digitization and involving more and more automated intelligent machines for simple to highly complex and crucial tasks. It gives rise to malicious intentions to get the financial and other benefits. As a result, the global infrastructure is highly exposed to several cyber threats. Several laws are proposed to penalize the malevolent and to provide protection against those threats. But the law alone is not sufficient to ensure security in critical infrastructures. The current trend of cyber-attacks (Embroker, n.d.) against various global infrastructure sectors requires highly robust and dynamic solutions and laws. It provides a protective wall for intruders or attackers. The main challenge in this field is to handle the vast amount of traffic data generated from the cyber platforms to design security frameworks for cyber-attack detection. Machine learning (ML) and deep learning (DL) are two popular approaches applied to handle high volumes of network traffic data. It assists in designing an intelligent framework that can be either static or dynamic. This framework can dynamically adapt to the future trend of attacks with proper implementation. The deep learning steps consider unorganized raw data by exempting the data preprocessing and feature engineering compared to traditional machine learning techniques.

An intrusion detection system (IDS) is one of the techniques used to defend against known and unknown cyber-attacks. It can be defined as a method or tool that passively monitors the copy of real-time traffic to detect any intrusive or malicious traffic in the network. Most of the work related to the IDS are broadly categorized as a) Signature-based, b) Anomaly-based and c) Hybrid approach. The first category solely tries to extract attack patterns to find any intrusive traffic. At the same time, the anomaly-based approach involves finding the deviation of incoming traffics from the normal traffic profile on which the approach is trained. Most of the existing works on IDS (Rawat et al., n.d.) apply traditional machine learning approaches. Further, this category can be subcategorized into Network-based or Host-based IDS.

Motivation: Tree-based algorithms have the advantage of generating human-readable decisions, which assist security experts in detecting malicious traffic. Most tree-based algorithms apply traditional feature selection methods for feature extraction, which is difficult in real-time traffic scenarios due to high traffic volume. The deep learning approach has an advantage over other traditional machine learning techniques that do not need to separate feature selection and extraction techniques. This approach also boosts better learning parameters that best fit the training data to improve the classification performance.

This motivates authors of this paper proposes a hybrid approach that combines deep autoencoder and ML classifiers for attack detection and evaluation. Autoencoder (AE) acts as a feature optimization tool in contrast to the traditional methods. The performance of the autoencoder is evaluated for a different number of neurons in the bottleneck layer. The best feature vector for which reconstruction error of AE is lowest applied on several ML classification and regression models (Decision tree, Random Forest, XGBoost, Logistic Regression and SVM) on three different datasets such as NSL-KDD (Tavallaee et al., 2009), UNSW-NB15(Moustafa & Slay, 2015) and BoT-IoT (Koroniotis et al., 2019) in a sequential fashion. The accuracy of the proposed framework for NSL-KDD is 77.85%, UNSW-NB15 is 80.89% and Bot-IoT is 99.98%. The result analysis shows a promising performance of the proposed work on each model where AE is applied as a feature optimization tool.

The contribution of the paper is summarized below:

  • •Deal with class imbalance problems using SMOTE technique.

  • •Proposed autoencoder (AE) as feature selection technique in contrast to the traditional methods like filter, wrapper, etc.

  • •Different traditional datasets (NSL-KDD, UNSW-NB15 and Bot-IoT) are used to visualize the effectiveness of AE as feature selection.

  • •Design an IDS applying several ML classifications and regression models (Decision tree, Random Forest, XGBoost, Logistic Regression and SVM). These models use the learned code of the bottleneck layer of AE, which is trained and tested against every aforesaid traditional dataset.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing