Article Preview
Top1. Introduction
Ensemble learning (EL) has been successfully used as a desirable learning scheme for many regression and classification problems because of its potential to greatly increase predictive accuracy (Idris, Khan, & Lee, 2013; Nadig, Potter, Hoogenboom, & Mcclendon, 2013; P. Sun & Lee, 2014). An ensemble refers to a group of base learners whose decisions are aggregated with the goal of achieving better performance than its constituent members (Caruana, Munson, & Niculescu-Mizil, 2006). Typically, EL algorithm consists of two main stages: first, the production of multiple base classifiers for one specific task, and then the combination of these classifiers to get a final predictive decision (Banfield, Hall, Bowyer, & Kegelmeyer, 2005; Li, Yu, & Zhou, 2012; Tsoumakas, Partalas, & Vlahavas, 2009).
However, it is obvious that combining all of the classifiers in an ensemble adds a lot of computational overheads (Tsoumakas et al., 2009) (Polikar, 2009). Both theoretical and empirical studies have shown that instead of using the whole ensemble, a subset of the ensemble can achieve equivalent or even better generalization performance (Lu, Wu, Zhu, & Bongard, 2010; Y. Zhang, Burer, & Street, 2006). Therefore, an additional intermediate stage that deals with the selection of an appropriate sub-ensemble prior to its combination has to be considered (Dai, 2013a; Dai & Liu, 2013). This stage is generally termed as ensemble pruning (EP) (Lu et al., 2010), selective ensemble (C. X. Zhang & Zhang, 2011), ensemble selection(Maskouni, Hosseini, Abachi, Kangavari, & Zhou, 2018), or ensemble thinning (Banfield et al., 2005).
The problem of pruning an ensemble of classifiers has been proven to be NP-complete (Tamon & Xiang, 2000). Finding the best sub-ensemble through exhaustive searching is not feasible for the original ensemble with large or even moderate size (Partalas, Tsoumakas, & Vlahavas, 2008).
EP can be further categorized into two types, namely static pruning (SP) and dynamic pruning (DP) methods (Soto, García-Moratilla, Martínez-Muñoz, Hernández-Lobato, & Suárez, 2017). In SP methods, a fixed set of predictors from an initial pool is selected from the ensemble for all test patterns, while DP methods predictors are selected based on the test pattern. Of the SP methods, search-based (Jiang, Liu, Fu, & Wu, 2017), clustering-based (Fan, Tao, Zhou, & Han, 2017), optimization-based (Lessmann, Caserta, & Arango, 2011), ordered aggregation methods (L. Guo & Boukir, 2013), and other methods exist that have a combination of these categories or use elaborate pruning methods are the most commonly used (C. X. Zhang & Zhang, 2011).
The greedy algorithms (Hernández-Lobato, Martínez-Muñoz, & Suárez, 2011) (Partalas, Tsoumakas, & Vlahavas, 2012) (Martínez-Muñoz, Hernández-Lobato, & Suárez, 2008), also known as hill climbing algorithms, which reduce the search space appropriately, seem to be a good choice. Various greedy ensemble selecting (GES) algorithms have been proposed, just as in Refs. (Banfield et al., 2005) (Martínez-Muñoz & Suárez, 2004) (Abdesslem, Marwa, & Maroua Bencheikh, 2013) (Dai, 2013b) (Gevezes & Pitsoulis, 2015) (Pérez-Gállego, Castaño, Quevedo, & Coz, 2018) (Partalas, Tsoumakas, & Vlahavas, 2010).