Classification Prediction of Lung Cancer Based on Machine Learning Method

Classification Prediction of Lung Cancer Based on Machine Learning Method

Dantong Li, Guixin Li, Shuang Li, Ashley Bang
DOI: 10.4018/IJHISI.333631
Article PDF Download
Open access articles are freely available for download

Abstract

The K-nearest neighbor interpolation method was used to fill in missing data of five indicators of coronary heart disease, diabetes, total cholesterol, triglycerides, and albumin;, and the SMOTE algorithm was used to balance the number of variable indicators. The Relief-F algorithm was used to remove 18 variable indicators and retain 42 variable indicators. LASSO and ridge regression algorithms were used to remove eight variable indicators and retain 52 variable indicators; The prediction accuracy, recall, and AUC values of the linear kernel support vector machine model filtered using Relief-F and LASSO features are high, and the prediction results are optimal; The test result of random forest screened by Relief-F and LASSO features is better than that of the support vector machine model. It is concluded that the random forest model screened by Relief-F features is better as a prediction of lung cancer typing. The research results provide theoretical data support for predicting lung cancer classification using machine learning methods.
Article Preview
Top

Introduction

With rapid economic and social development, human lifestyle and eating habits have changed significantly. In the face of long-term unhealthy living conditions, ionizing radiation, poor environment, and other adverse factors, the incidence rate of cancer in China is increasing year by year, and the types of cancer are also increasing. According to a survey from the International Agency for Research on Cancer, the number of cancer deaths worldwide is growing exponentially. The number of deaths due to different cancers in 2019 was 1.8 million for lung cancer, 870,000 for colorectal cancer, 780,000 for gastric cancer, 780,000 for liver cancer, and 630,000 for breast cancer (Wang & Yuan, 2019) (see Figure 1). Among them, lung cancer has the highest incidence rate and is particularly prominent among men. Lung cancer, as the most common fatal disease worldwide, is influenced by multiple factors. Smoking has been identified as the main risk factor for lung cancer, and smokers are more than 10 times more likely to develop lung cancer than nonsmokers. Harmful substances such as PM2.5, sulfur dioxide, and carbon monoxide in the air are also increasing the risk of lung cancer. At the same time, professions such as mining, welding, and painting are increasing the risk of lung cancer due to long-term exposure to harmful substances. According to the latest cancer burden data, there are over 4 million confirmed lung cancer patients worldwide each year, with nearly half of them dying from cancer.

Figure 1.

Statistics of cancer death cases

IJHISI.333631.f01

Lung cancer poses a huge threat to human survival and health. The confirmed cases of lung cancer are mainly adenocarcinoma, small cell lung cancer, and squamous cell carcinoma of the lung. The treatment methods for different types of lung cancer vary greatly (Abdullah et al., 2021). At the same time, it is necessary to pay attention to the patient's psychological state and prescribe appropriate drugs before treatment. Tumor markers and imaging diagnosis of lung cancer are widely used in clinical practice, but some markers, such as carcinoembryonic antigen, are not specific enough to cause errors in clinical diagnosis. Imaging diagnosis (such as chest X-ray, CT, magnetic resonance imaging, etc.) has certain value for diagnosis; however, small pulmonary nodules or lymph node metastases may be missed due to poor imaging. The main treatment methods for lung cancer include surgery, radiotherapy, chemotherapy, and targeted therapy. For early-stage lung cancer patients, surgical treatment can be used and is currently the most effective treatment method. Radiotherapy, which kills cancer cells to alleviate symptoms, is mainly aimed at patients whose cancer cannot be surgically removed or who have residual cancer cells after surgery. Chemotherapy mainly targets patients with advanced lung cancer, killing cancer cells through intravenous injection or oral medication. Targeted therapy is the targeted killing of lung cancer cells by identifying their molecular targets. With the development of lung cancer screening technology, most lung cancer is easily detected in the early stage. At the same time, with the rapid growth of medical data information, a large amount of medical diagnostic information has been digitized. Establishing lung cancer prediction models to assist diagnosis and treatment has important research significance.

Today, the incidence and mortality rate of lung cancer have rapidly increased, and this has become the cancer with the highest mortality rate in the world. By analyzing lung cancer medical data through machine learning, a complete lung cancer prediction model is established to provide a basis for assisting in lung cancer prevention, diagnosis, and treatment measures. This paper selects the clinical diagnosis, treatment, and experimental data of lung cancer patients in the database of the US National Center for Biotechnology Information (NCBI) and uses the K nearest neighbor interpolation and synthetic minority over-sampling technique (SMOTE) to complete missing values and solve the problem of data imbalance. The Relief-F filtering method and least absolute shrinkage and selection operator (LASSO) embedding method are used to extract the characteristics of patient indicators, and the prediction model is constructed through support vector machines and random forest machine learning methods. Then, the prediction effect is compared between the recall rate and area under curve (AUC) indicators through the accuracy rate.

Complete Article List

Search this Journal:
Reset
Volume 19: 1 Issue (2024)
Volume 18: 1 Issue (2023)
Volume 17: 2 Issues (2022)
Volume 16: 4 Issues (2021)
Volume 15: 4 Issues (2020)
Volume 14: 4 Issues (2019)
Volume 13: 4 Issues (2018)
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing