A Modified Cuckoo Search Algorithm for Data Clustering

A Modified Cuckoo Search Algorithm for Data Clustering

Preeti Pragyan Mohanty, Subrat Kumar Nayak
Copyright: © 2022 |Pages: 32
DOI: 10.4018/IJAMC.2022010101
Article PDF Download
Open access articles are freely available for download

Abstract

Clustering of data is one of the necessary data mining techniques, where similar objects are grouped in the same cluster. In recent years, many nature-inspired based clustering techniques have been proposed, which have led to some encouraging results. This paper proposes a Modified Cuckoo Search (MoCS) algorithm. In this proposed work, an attempt has been made to balance the exploration of the Cuckoo Search (CS) algorithm and to increase the potential of the exploration to avoid premature convergence. This algorithm is tested using fifteen benchmark test functions and is proved as an efficient algorithm in comparison to the CS algorithm. Further, this method is compared with well-known nature-inspired algorithms such as Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), Particle Swarm Optimization with Age Group topology (PSOAG) and CS algorithm for clustering of data using six real datasets. The experimental results indicate that the MoCS algorithm achieves better results as compared to other algorithms in finding optimal cluster centers.
Article Preview
Top

1. Introduction

Clustering is a method of grouping an enormous amount of data into different groups. The data in the same group exhibit similar properties, while the data in different groups exhibit different properties. This technique is used for finding patterns among the data in each group. Over the past few years, clustering has played a key role in various fields of research such as image analysis, machine learning, data mining, pattern recognition, information retrieval, statistics, biology, medical sciences, market research, etc. (Ahalya & Pandey, 2015).

The traditional clustering algorithms are mostly classified into two prime types, i.e., hierarchical clustering and partitional clustering (Leung et al., 2000; Xu & Tian, 2015). The outcome of the hierarchical clustering approach is a tree-like structure representing the clustering process, where the dataset is partitioned into different groups. In this type of clustering, the data objects present in one group cannot be reassigned to another group (Armano & Farmani, 2016). Moreover, this clustering can be performed even if the number of groups is not known. The major disadvantage of this technique is that it fails to separate the overlapping groups as the information about the shape and size of groups is not known. The partitional clustering approach partitions the dataset into a set of groups such that data present in each group are different. In partitional clustering, either the number of clusters is known prior to the partitioning or it is unknown and needs to be predicted for an unknown dataset. This paper deals with the partitional clustering problem, where the number of clusters of a dataset is known beforehand. The partitional clustering approach aims to optimize some dissimilarity criteria such as minimizing the intra-cluster distance between the objects in one group and maximizing the inter-cluster distance between different groups.

Generally, partitional clustering algorithms consist of centroid-based algorithms. One of the popular centroid-based algorithms is the k-means algorithm proposed by Stuart Lloyd in 1957 (Lloyd, 1982). The main aim of the k-means clustering algorithm is to partition the objects into k groups by randomly choosing k number of data objects as initial centroids. Though this algorithm is easy to implement, still it gets stuck at the local minimum as it is sensitive to the initial position of the groups.

The nature-inspired metaheuristic algorithms such as Firefly Algorithm (FA) (Yang, 2008), Ant Colony Optimization (ACO) (Dorigo, 1992), Cuckoo Search (CS) (Yang & Deb, 2009), Particle Swarm Optimization (PSO) (Kennedy & Eberhart, 1995), Glowworm Swarm Optimization (GSO) (Krishnanand & Ghose, 2005), Artificial Bee Colony (ABC) (Karaboga et al., 2005), etc. have been able to improve upon the disadvantages of k-means algorithm. These algorithms are used to solve optimization problems that do not have a specific satisfactory solution and generate near-optimal results (Nesmachnow, 2014). In these algorithms, a population of candidate solutions is randomly generated and the best population for the next generation is selected based on some fitness values. These algorithms simultaneously optimize these candidate solutions to generate a globally optimized solution (Jiang et al., 2013).

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing