Article Preview
Top1. Introduction
Advances in optimization techniques, algorithm design, and computer technology have allowed for the effective solution of increasingly complex problems. In the field of combinatorial optimization, heuristic solution methods are often required to tackle large-scale instances of NP-hard problems. A common objective is to find the best trade-off between solution quality and computation time. Because the performance of heuristic algorithms typically varies with the size and structure of the problem instance, researchers and practitioners are faced with the challenge of selecting the potential best algorithm(s) to solve particular class of instances. From a theoretical perspective this requirement is supported by the so-called no-free-lunch theorem (Wolpert and Macready 1997), which generally states the inexistence of an overall best optimization algorithm.
In general, heuristic algorithms can be compared on a theoretical or empirical basis. In cases where a theoretical bound on the quality of the heuristic solution can be devised, algorithms may be compared in terms of average-case or worst-case performance. However, because these results are determined on the limit, they do not indicate how algorithms perform on specific instances. Furthermore, devising provable bounds for combinatorial optimization problems is usually difficult to carry out; hence, deducting properties yielding performance guarantee for general (often sophisticated) heuristics is currently out of scientific reach. Thus, empirical analysis is necessary to provide insights into the selection of algorithms that best fit a recognized problem structure. Besides, an overall analysis may be relevant to assess robustness of alternative algorithms across different classes of instances. Hooker (1994) argued that traditional comparative analysis of heuristic algorithms lacks both scientific rigor and minimum reproducibility standards. Consequently, this author advocates for the development of an empirical science of algorithms based on rigorous experimental design and analysis, and empirically-based explanatory theories. By the same token, Črepinšek et al. (2014) stress the importance of replicability of computational experiments for a comparative analysis of algorithms to carry scientific merit and produce valid results. As shown in Črepinšek et al. (2016) even small errors can make experiments unequal and conclusions invalid. We refer to their original papers for a discussion of common pitfalls and general design guidelines on setting up and conducting computational experiments, including those involving non-deterministic algorithms. This paper advances on this empirically-based knowledge presenting a non-parametric framework for assessing algorithms considering multiple performance metrics.