A Semi-Supervised Approach to GRN Inference Using Learning and Optimization

A Semi-Supervised Approach to GRN Inference Using Learning and Optimization

Meroua Daoudi, Souham Meshoul, Samia Boucherkha
Copyright: © 2021 |Pages: 22
DOI: 10.4018/IJAMC.2021100109
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Gene regulatory network (GRN) inference is a challenging problem that lends itself to a learning task. Both positive and negative examples are needed to perform supervised and semi-supervised learning. However, GRN datasets include only positive examples and/or unlabeled ones. Recently a growing interest is being devoted to the generation of negative examples from unlabeled data. Within this context, the authors propose to generate potential negative examples from the set of unlabeled ones and keep those that lead to the best classification accuracy when used with positive examples. A new proposed genetic algorithm for fixed-size subset selection has been combined with a support vector machine model for this purpose. The authors assessed the performance of the proposed approach using simulated and experimental datasets. Using simulated datasets, the proposed approach outperforms the other methods in most cases and improves the performance metrics when using balanced data. Experimental datasets show that the proposed approach allows finding the optimal solution for each transcription factor in this study.
Article Preview
Top

Introduction

Understanding and modelling regulation in biological systems is a challenging task in bioinformatics as it requires identification of the relationship between the various components of these systems and also inferring potential influences that some components may have on others. Since the advancement in high throughput technologies and the generation of massive biological data, several computational biology techniques have been proposed and several models have been developed to help in knowledge discovery and better understanding of the regulations between biological components and how these regulations give rise to the functions and behaviors of biological systems. At the cell level, a Gene Regulatory Network (GRN) is a set of genes and regulators that interact with each other to govern the gene expression level. Modeling these interactions is a powerful abstraction of biological systems that can serve as a tool to understand and analyze the genes' interactions and the functions within a cell.

The first step in a GRN inference process is to identify the primary regulation between regulators known as transcription factors (TFs) and their target genes which help in understanding genetic processes and genetic modifications (Desai et al., 2017). Using experimental methods to determine regulation between TFs and target genes is costly and time-consuming (Patel & Wang, 2015). Furthermore, high throughput technologies generate massive expression data offering a great opportunity to infer GRNs using computational methods. Inferring GRNs from expression data can be cast as a machine learning problem and several approaches have been already proposed using supervised, unsupervised and semi-supervised techniques. As a first attempt, a large variety of unsupervised methods have been proposed. Then, a large number of transcription factors and their target genes is identified with experimental methods and they are considered as known regulations. New available and labeled datasets have led to the use of supervised and semi-supervised techniques to better make use of new data to infer more reliable and efficient networks. One of the most challenging tasks when inferring GRNs from gene expression data using supervised and semi-supervised learning is how to extract reliable negative examples. The challenge is due to the difficulty to verify experimentally the absence of any regulation between a TF and a target gene (Gillani et al., 2014) and to the high computational complexity of GRN inference. The performance of supervised methods depends on the quality of available data and most proposed approaches consider unknown regulations as negative, which could affect the performance of the classifier (Turki & Rajikhan, 2016).

To overcome the aforementioned limitations, we propose in this paper a generative approach to select reliable negative examples. The main idea is to generate potential negative examples using a search in the space of unlabeled examples and keep those that lead to the best classification accuracy when used with positive examples. This can be cast as an optimization process where the decision variables are related to the unlabeled examples and the objective function is related to the classification accuracy. To achieve this generation task, we propose the joint use of an optimizer and a classification model as shown in Figure 1.

Figure 1.

Outline of the proposed generative approach

IJAMC.2021100109.f01

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing