Article Preview
Top1 Introduction
The wealth of the network information have been stored in several public collections of databases, including KEGG (Kanehisa et al., 2006) and BioCyc (2010). These databases provide tools for pathway visualization and for queries on pathway components such as substrates, products and reactions. However, computational tools are called for transferring the well studied knowledge in databases to unknown one (e.g. for searching for homologues to a query pathway in a collection of known pathways) and for aligning two pathways to locate conserved pathway fragments.
Most of the high-throughput genomic, proteomic data are not catalogued or processed directly by hand but by computational technologies. It means that the high-throughput datasets may be filled with technical and biological noise and have different technical biases and coverage (Han, 2008) or the usage of inconsistent data/tool versions. All of them requires data curation. However not all data are curated. For BioCyc (2010), only a part of pathway data have received person-decades of literature-based curation and are the most accurate. However the logical interpretation of the whole network is definitely not easily comprehensible to the human brain and therefore it is impossible to curate all the more and more increasing data only by hand. The computational tool is required for inferring the missing/inconsistent information in genome and pathway databases.
All of them require efficient computational methods for analyzing and comparing networks. In this paper, we focus on network alignment for comparing, exploring, and predicting these networks.
Let the pattern be a pathway for which one is searching for homologous pathways in the text, i.e., the known metabolic network of a different species. Alignment of two networks, pattern and text, is a basic task which can meet the requirement of a series of open questions such as network evolution and critical target search. This is a challenging research topic from both biological and computational perspective.
Existing approaches to subgraph iso- and homeomorphism restrict the size (Sharan et al., 2005; Yang & Sze, 2007) or topology of the pattern (Chen & Hofestaedt, 2004, 2005; Pinter, Rokhlenko, Yeger-Lotem, & Ziv-Ukelson, 2005; Cheng, Harrison, & Zelikovsky, 2007; Cheng, Kaur, Harrison, & Zelikovsky, 2007) or use hueristics and approximation algorithms. GraphMatch (Yang & Sze, 2007) allows to delete disassociated vertices or induced subnetwork in query network and then align its remainder to target network by subgraph isomorphism.
However, the widespread evolutionary machinery of gene duplication results in vertex copying (Sharan & Ideker, 2006). The results of network alignments can be enhanced when gene duplication and divergence are taken into account. If two enzymes in the pattern species are evolutionarily related, the corresponding enzymes can be mapped into a single enzyme. The mapping explores the characteristics of graph homomorphism.
Based on the property, we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. In this paper we present combinatorial algorithms, which take into account the similarity of both the enzymes' functions arbitrary network topologies such as trees and arbitrary graphs. The proposed algorithm is fixed parameter tractable in the liner or square of the size of feedback vertex set respectively for the case of disallowing or allowing the deletions.