Article Preview
Top1. Introduction
Feature Selection is a crucial research area in the development of efficient classification algorithms for high dimensional datasets (Khaire and Dhanalakshmi, 2019). A proper feature selection technique should enhance the performance of the classification model. High dimensional dataset suffers from the famous problem of “curse of dimensionality” (Xue et al., 2016). The original dataset contains irrelevant and redundant features that degrade the performance of the algorithm and also increase computational overheads. Feature selection methods aim to reduce the feature space by removing superfluous and insignificant elements. These methods can be broadly classified into wrapper and filter methods. In the wrapper approach, the classifier is used to evaluate the quality of the selected features. The filter methods use concepts from information theory to obtain a relevant set of features. Wrapper methods are usually slower than filter methods. However, they can generate better classification performance than filter methods (Grande et al., 2007).
The feature selection technique must be capable of searching every possible subset of the original feature set to obtain the optimum feature subset. However, it is impractical to achieve this task owing to the high computational cost that comes along with it which consequently results in the selection of a sub-optimum feature subset. Nowadays, evolutionary algorithms and metaheuristic techniques are popular for solving the problem of feature selection. The metaheuristic approaches did not guarantee the best results; however, it will produce better solutions within time bounds (Talbi et al., 2009). Some of the most popular metaheuristics are genetic algorithm (GA) (Holand et al., 1992), differential evolution (DE) (Storn and Price, 1997), tabu search (TS) (Hedar et al., 2006), simulated annealing (SA) (Kirkpatrick et al., 1983), particle swarm optimization (PSO) (Kennedy and Eberhart, 1995), ant colony optimization (ACO) (Dorigo et al., 2006), artificial bee colony (ABC) (Yi and He, 2014) and Cuckoo Search (CS) (Yang and Deb, 2009), ant lion optimization (ALO) (Emary et al., 2016) and sine-cosine algorithm (SCA) (Mirjalili, 2016).
A good metaheuristic should be capable of maintaining a balance between exploration and exploitation strategies during the search process (Talbi et al., 2009). Population-based methods like PSO or ACO are better in exploring the search space while single solution-based methods employ the right exploitation strategies. To achieve a balance between the two, hybrid methods can be used that combine the merits of two or more metaheuristic to achieve better optimization results.
Butterfly Optimization Algorithm (BOA) is a recent nature-inspired swarm algorithm that takes inspiration from the food searching behavior of butterflies in a natural environment (Arora and Singh, 2018). BOA has produced superior results for continuous optimization problems when compared with other recent metaheuristics. The binary version of BOA called bBOA was proposed by Arora and Anand (2019) for feature selection problems in wrapper mode. The binary version bBOA has the same structure as that of native BOA except that it utilizes two transfer functions for generating binary solutions.
A metaheuristic based on hill-climbing called Simulated Annealing was proposed by Kirkpatrick et al (1983). The method tries to improve the current solution at each iteration by generating a trial solution in its vicinity. The improved solution with better fitness value is always accepted while the solution with no improvement is taken with a specific probability value to avoid getting trapped in the local optima. The probability of acceptance of a worse solution is dependent on the temperature parameter. The temperature parameter decreases with every iteration at a fixed rate called the cooling schedule.
Despite producing superior results, the weak exploitation ability of BOA prematurely converges it to local optima. Also, the random transition between the exploration and exploitation phase in BOA is determined by the value of the switching probability which sometimes distracts the BOA from attaining the global optima (Arora et al., 2018). On the other hand, the SA algorithm possesses excellent exploitation properties.