A Hybrid Whale Genetic Algorithm for Feature Selection in Biomedical Dataset

A Hybrid Whale Genetic Algorithm for Feature Selection in Biomedical Dataset

Tarushi Agrawal, Priya Bist, Nimit Jain, Parul Agarwal
Copyright: © 2022 |Pages: 18
DOI: 10.4018/IJSIR.302613
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

One of the major concerns of biomedical datasets is high dimensionality. These dimensions may include irrelevant and redundant features that adversely affect the performance of classification algorithms. Extensive research has been done in the area of machine learning to handle high dimensionality. In literature, feature selection algorithms have been developed for this purpose. In this paper, a hybrid nature-inspired algorithm is proposed which is a combination of whale optimization algorithm and genetic algorithm for feature selection. The proposed algorithm is applied to four microarray datasets and one DNA sequence dataset and compared with classical feature selection methods. In all the algorithms decision tree classifier is mainly employed. To reach an approximate best solution and remove local solutions, the exploitation and exploration phases are balanced efficiently in the proposed algorithm. The convergence speed in the proposed algorithm is accelerated by the adaptive mechanisms. Overall results employ better performance on the majority of datasets.
Article Preview
Top

1. Introduction

Classification is a process of segregating samples into their respective labels based on the information provided by their features. Some features in data can be relevant or irrelevant. Irrelevant and redundant features do not contribute to the classification task and reduce the performanceby increasing the search space(Miao & Niu, 2016). This is referredto as the curse of dimensionality problem. The problem can be tackled with the help of feature selection techniques. Feature selection helps to identify and select the features which contribute greatly to the classification process. It is different from dimensionality reduction and feature extraction. All the methods reduce the number of features in a dataset, but each onehas a different technique.Dimensionality reduction combines different features to form new attributes, feature extraction forms new features along the direction where maximum variation is present and feature selection just includes or excludes features present without any changes.

An efficient feature selection algorithm determines the subset of features that help in the precise determination of the class. The aim of feature selection is three-fold: improve the performance of classifiers, reducing computational complexity, and getting better insights from the data. Feature selection techniques help to select those features which will boost classification accuracy requiring fewer data. Fewer attributes are preferred because it reduces the complexity of the model and is easier to learn. Feature selection techniques are broadly divided into three categories: Filter, Wrapper, and Embedded.The filter approach(Bosin et al., 2007) applies mathematical measures and calculates a score for each feature. A rank is assigned to the features based on this score. This approach is independent of any supervised learning algorithm and hence achieves more generality than any other approach. They are called filters since they filter the features before training. It is also computationally less expensive and therefore good for high-dimensional spaces.The wrapper approach(Nnamoko et al., 2014) is based on a greedy search algorithm. It evaluates all possible combinations of the features andis based on a particular supervised learning algorithm. It selects a random subset of features and trains the algorithm with those features. The scoring function of the wrapper approach is the accuracy of the algorithm itself. Since it evaluates the feature subset using a specific learning algorithm, it does not possess generality like the filter method. Due to its high computational complexity, it can prove to be an obstruction in high-dimensional datasets. Filters are faster but less effective than wrappers because they do not take into consideration the fact that different learning algorithms might provide good results with different feature subsets.The embedded approach(Fu et al., 2009) uses a part of the learning algorithm to generate subsets and has less generality than a filter. An example of an embedded technique is a decision tree.At each step of the tree, a feature is selected recursively.Ridge with L2 penalty and LASSO with L1 penalty are also examples of theembedded approach.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 3 Issues (2023)
Volume 13: 4 Issues (2022)
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing