Software Vulnerability Prediction Using Grey Wolf-Optimized Random Forest on the Unbalanced Data Sets

Software Vulnerability Prediction Using Grey Wolf-Optimized Random Forest on the Unbalanced Data Sets

Wasiur Rhmann
Copyright: © 2022 |Pages: 15
DOI: 10.4018/IJAMC.292508
Article PDF Download
Open access articles are freely available for download

Abstract

Any vulnerability in the software creates a software security threat and helps hackers to gain unauthorized access to resources. Vulnerability prediction models help software engineers to effectively allocate their resources to find any vulnerable class in the software, before its delivery to customers. Vulnerable classes must be carefully reviewed by security experts and tested to identify potential threats that may arise in the future. In the present work, a novel technique based on Grey wolf algorithm and Random forest is proposed for software vulnerability prediction. Grey wolf technique is a metaheuristic technique and it is used to select the best subset of features. The proposed technique is compared with other machine learning techniques. Experiments were performed on three datasets available publicly. It was observed that our proposed technique (GW-RF) outperformed all other techniques for software vulnerability prediction.
Article Preview
Top

1. Introduction

A vulnerability is a weakness in the software that, when exploited, causes a security failure. Due to time constraints, software developers usually do not concern much about the security aspects at the initial stages of the software development that results in security failures in the operational stages. It is difficult to detect the vulnerability in the software until they hinder the normal operation of the software. Prediction of software vulnerability during the early stage of the life cycle is a promising approach. Software organizations perform security checks to avoid software failures and the presence of vulnerabilities in the software may lead to software failures. A fault in the software specification, development, or its configuration is vulnerability if its execution results in a violation of security policy (McGraw & Potter, 2004). A fault in the software system if accidentally executed then the software may not be able to perform its required or expected function (Shin & Williams, 2008). Software faults are defects or bugs in the software system and vulnerability refers to those software faults which leads security failure if exploited. Software metrics are heavily used in literature to predict software maintainability and change (Bansal, 2017) and defect proneness (Gyimothy et al., 2005) Numerous studies have shown the relation between software architecture and structural software metrics like complexity, coupling, and cohesion (CCC).CCC metrics are very efficient in measuring the quality of software architecture (QSA) and QSA influences the quality of software. Despite being heavily used of these metrics there is no available proper guideline on how one can use these structural metrics in the prediction of the software vulnerability. The use of these structural metrics in vulnerability prediction may lead to more secure and reliable software (Walden, et al.,2014) . Early-stage detection of security vulnerabilities in the software development life cycle may mitigate the risk of software security failures. In recent years a number of metaheuristics algorithms like Particle swarm optimization(PSO), Genetic algorithm(GA), Firefly algorithm(FA) etc are applied for feature selection and hyper-parameter optimization of machine learning and deep learning algorithms. Neggaz et al. (2020a) have used a novel Henry gases solubility optimization for feature selection. Proposed Henry gases solubility optimization is compared with six other algorithms on 12 datasets. This technique has shown improved accuracy with less number of features. Neggaz et al. (2020b) have proposed improved slap swarm algorithm for feature selection. They have used Sine Cosine algorithm and Disrupt Operator and compared accuracy with other swarm intelligence algorithms and found better results. In our previous work(Rhmann et al., 2021) metaheuristic algorithms firefly algorithm, genetic algorithm and black hole optimization algorithms are used for optimization of software efforts using ensemble techniques. In this study metaheuristic algorithms based random forest techniques are applied for software vulnerability prediction.

The rest of the paper is organized as follows: Section 2 describes the related work Section 3 describes the data-sets used in the study. Section 4 describes the proposed software vulnerability model. Section 5 describes the performance evaluation measures. In Section 6 experimental setup and results of prediction models are given. In section 7 comparative analysis of the work is presented and section 8 describes the application of work. Section 9 describes the threats to the validity and finally, section 10 is used to conclude the work.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing