Hepatitis C Prediction Using Feature Selection by Machine Learning Technique

Hepatitis C Prediction Using Feature Selection by Machine Learning Technique

Jeet Majumder, Suman Ghosh, Alex Khang, Tridibesh Debnath, Avijit Kumar Chaudhuri
DOI: 10.4018/979-8-3693-2105-8.ch013
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This study suggests a prediction framework for the Hepatitis C virus that is based on machine learning techniques. The authors made use of a dataset available on Kaggle. In this dataset, 564 patients with 12 distinct features are present. They tested two cases, the first one without feature selection and with feature selection based on gain ratio attribute evaluation (GRAE), to guarantee the strength and dependability of the suggested framework. Additionally, an evaluation is conducted on the feature subset that was chosen using the GRAE-generated features. For model evaluation, induction methods and classifiers such as logistic regression (LR), naive bayes (NB), decision tree (DT), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) are used. According to the experimental findings, the suggested framework outperformed the others in terms of all accuracy matrices following GRAE selection. According to the experimental findings, the suggested framework outperformed the unfeatured one in terms of accuracy after GRAE selection.
Chapter Preview
Top

1. Introduction

One of the main viruses that cause liver disease is the hepatitis C virus (HCV), which belongs to the Flaviviridae family. Approximately 175 million individuals globally, or 3% of the global population, are infected with HCV. Although 90% of injectable drug users are most at risk, parental transmission is the primary mode of HCV transmission. Conventional interferon and ribavirin, which have 38–43% sustained virological response rates, are still the gold standard for treating chronic HCV (Munir et al., 2010). Approximately 58 million people worldwide suffer from long-term hepatitis C virus disease, and 1.5 million new cases are reported each year. Nearly 3.2 million children and adolescents suffer from a chronic case of hepatitis C (WHO, 2023). It is asymptomatic at first, but when the infection worsens, it can cause chronic illnesses such as hepatocellular cancer and liver cirrhosis. To diagnose this illness, a few different non-invasive serum biochemical indicators are employed (Nandipati et al., 2020).

To determine the disease's stage, a variety of harmless blood biochemical indicators and patient medical information have been employed. Machine learning techniques have shown to be a helpful alternative for determining the phase of this chronic liver disease, avoiding the drawbacks of a biopsy (Butt et al., 2021). To stop the spread of disease and identify affected areas early on, medical research relies heavily on the forecasting and categorization of diseases. Machine learning (ML) techniques are frequently employed to accurately forecast and categorise diseases, serving as a useful tool for medical professionals (Mamdouh Farghaly et al., 2023). Clinical data contains complex and non-linear correlations that machine learning (ML) techniques are especially good at acquiring and analysing.

Through the identification of HCV-positive individuals, machine learning algorithms, including classification approaches, can be employed to create a model for HCV diagnosis. However, unsuitable attribute set features can degrade the classifier's effectiveness (John et al., 1994). When there are more important and non-redundant attributes in the data, various learning algorithms perform better and produce more accurate results. An effective feature selection strategy is required to extract intriguing aspects relevant to the condition, as clinical datasets contain a huge number of duplicated and irrelevant information (Jain & Singh, 2018).

Hepatocellular carcinoma (HCC) infection is still an important contributor to liver cirrhosis, liver transplants, and a global health problem today. But because of decades of incredible progress, HCV is now the first chronic viral infection that can be cured (Manns & Maasoumy, 2022) as shown in Table 1.

Table 1.
Data set for Hepatitis C virus performance comparison
AuthorMethodPerformance MatrixPercentage
Orooji and Kermani (2021) MLP, Bayesian network, and DTSpecificity100%
f-measure99.90%
Accuracy99.90%
Alotaibi et al. (2023) RF. Gradient Boosting Machine, DTrecall96.00%
precision99.81%
AUC/ROC96.00%
Accuracy96.92%
Abd El-Salam et al. (2019) Bayesian NetworkAccuracy68.90%
ROC74.80%
KayvanJoo et al. (2014) NBAccuracy89.17%
Eliyahu et al. (2018) RFAccuracy91%
Hashem et al. (2020) DT, LRAccuracy95.60%

Complete Chapter List

Search this Book:
Reset