Clique Size and Centrality Metrics for Analysis of Real-World Network Graphs

Clique Size and Centrality Metrics for Analysis of Real-World Network Graphs

Copyright: © 2018 |Pages: 15
DOI: 10.4018/978-1-5225-2255-3.ch565
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

We present correlation analysis between the centrality values observed for nodes (a computationally lightweight metric) and the maximal clique size (a computationally hard metric) that each node is part of in complex real-world network graphs. We consider the four common centrality metrics: degree centrality (DegC), eigenvector centrality (EVC), closeness centrality (ClC) and betweenness centrality (BWC). We define the maximal clique size for a node as the size of the largest clique (in terms of the number of constituent nodes) the node is part of. The real-world network graphs studied range from regular random network graphs to scale-free network graphs. We observe that the correlation between the centrality value and the maximal clique size for a node increases with increase in the spectral radius ratio for node degree, which is a measure of the variation of the node degree in the network. We observe the degree-based centrality metrics (DegC and EVC) to be relatively better correlated with the maximal clique size compared to the shortest path-based centrality metrics (ClC and BWC).
Chapter Preview
Top

Introduction

Network Science is a fast-growing discipline in academics and industry. It is the science of analyzing and visualizing complex real-world networks using graph theoretic principles. Several metrics are used to analyze the characteristics of the real-world network graphs; among them “centrality” is a commonly used metric. The centrality of a node is a measure of the topological importance of the node with respect to the other nodes in the network (Newman, 2010). It is purely a link-statistics based measure and not based on any offline information (such as reputation of the node, cost of the node, etc). The commonly used centrality metrics are degree centrality, eigenvector centrality, closeness centrality and betweenness centrality. Degree centrality (DegC) of a node is simply the number of immediate neighbors for the node in the network. The eigenvector centrality (EVC) of a node is a measure of the degree of the node as well as the degree of its neighbor nodes. We refer to DegC and EVC as degree-based centrality metrics. Closeness centrality (ClC) of a node is the inverse of the sum of the shortest path distances of the node to every other node in the network. Betweenness centrality (BWC) of a node is the ratio of the number of shortest paths the node is part of for any source-destination node pair in the network, summed over all possible source-destination pairs that do not involve the particular node. We refer to ClC and BWC as shortest path-based centrality metrics. Computationally efficient polynomial-time algorithms have been proposed in the literature (Brandes, 2001; Strang, 2005; Cormen et. al., 2009; Newman, 2010) to determine exact values for each of the above centrality metrics; hence we categorize centrality as a computationally lightweight metric.

A “clique” is a complete sub graph of a graph (i.e., all the nodes that are part of the sub graph are directly connected to each other). Cliques are used as the basis to identify closely-knit communities in a network as part of studies on homophily and diffusion. Unfortunately, the problem of finding the maximum-sized clique in a graph is an NP-hard problem (Cormen et. al., 2009), prompting several exact algorithms and heuristics to be proposed in the literature (Pattabiraman et. al., 2013; Fortunato, 2010; Palla et. al., 2005; Sadi et. al., 2010; Tomita & Seki, 2003). In this chapter, we choose a recently proposed exact algorithm (Pattabiraman et. al., 2013) to determine the size of the maximum clique for large-scale complex network graphs and extend it to determine the size of the maximal clique that a particular node is part of. We define the maximal clique size for a node as the size of the largest clique (in terms of the number of constituent nodes) the node is part of. Note that the maximal clique for a node need not be the maximum clique for the entire network graph; but, the maximum clique for the entire graph could be the maximal clique for one or more nodes in the network.

Since the maximal clique size problem is a computationally hard problem and exact algorithms run significantly slower on large network graphs, our goal in this chapter is to explore whether the maximal clique size correlates well to one of the commonly studied computationally lightweight metrics, viz., centrality of the vertices, for complex real-world network graphs: if we observe a high positive correlation between maximal clique size and one or more centrality metrics, we could then infer the ranking of the vertices based on the centrality values as the ranking of the vertices based on the maximal clique size in real-world network graphs. Ours will be the first chapter to conduct a correlation study between centrality and maximal clique size for real-world network graphs. To the best of our knowledge, we did not come across such a work that has done correlation study between these two metrics (and in general, a computationally hard metric vis-a-vis a computationally lightweight metric) for real-world network graphs. Throughout the chapter, we use the terms 'node' and 'vertex' as well as 'link' and 'edge' interchangeably. They mean the same.

Key Terms in this Chapter

Random Network: A network whose degree distribution follows a Poisson pattern wherein the degree of all the vertices lies close to the average degree and the probability of finding a vertex with a degree much farther away from the average degree is zero.

Algorithm: A sequence of steps to perform a particular task.

Correlation: A statistical dependence between two variables or sets of data.

Centrality: A quantitative measure of ranking the vertices of a graph based on its topological structure.

Clique: A sub graph of a graph such that there is an edge between any two vertices.

Spectral Radius: The largest Eigenvalue of the adjacency matrix of a graph.

Small-World Property: A characteristic property of complex networks wherein the number of hops on the shortest paths between any two vertices is at most the logarithm of the number of vertices in the network.

Betweenness Centrality: A measure of the fraction of shortest paths between any two vertices that go through a vertex, considered over all pairs of vertices excluding the vertex.

Topology: An arrangement of the nodes in the network that can be used for communication between any two nodes in the network.

Closeness Centrality: A measure of the sum of the lengths of the shortest paths from a vertex to the rest of the vertices.

Degree Centrality: A measure of the number of adjacent neighbors of a vertex.

Computationally Lightweight Metric: A network metric whose computation is not time consuming.

Scale-Free Network: A network whose degree distribution follows a power-law pattern wherein a majority of the vertices have a smaller degree, but there exists a non-zero probability for finding a vertex with a relatively much larger degree.

Clique Size: The size (number of vertices) of the largest clique a vertex is part of.

Eigenvector Centrality: A measure of the degree of a vertex as well as the degree of its neighbors.

Branch and Bound: An algorithmic strategy that prunes the search space and explores only candidate solutions that have the potential to be better than the currently known solution.

Complete Chapter List

Search this Book:
Reset