Machine learning-based meta-analysis of colorectal cancer and inflammatory bowel disease

Sardari, Aria (2024) Machine learning-based meta-analysis of colorectal cancer and inflammatory bowel disease. Masters thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (5MB)

Abstract

Colorectal cancer (CRC) is one of the leading causes of cancer-related death worldwide. Despite extensive research efforts, the mechanism of CRC remains poorly understood, and genetic biomarkers discovered thus far have not provided proper insight into the dynamics of CRC. One reason might be that most analysis methods perform univariate analyses and do not investigate the combination of genes that lead to disease. To fill this gap, we employ SVFS (Singular-Vectors Feature Selection), as well as several other machine learning algorithms, to identify genes associated with CRC. We developed an ensemble classifier model using identified genes to validate our findings and distinguish CRC tumour samples from adjacent normals. We validated our findings on 13 independent datasets and achieved significant results on all of them (correctly diagnosing 1755 cases out of 1807 and 115 controls out of 119). Several identified genes by our methodology have previously been reported to be associated with CRC, while other genes are novel and should be further researched. Furthermore, the same pipeline was applied to. Inflammatory Bowel Disease (IBD) since patients with IBD are at substantial risk of developing CRC. Following significant results on validation sets of IBD using identified genes (correctly 212 IBD cases out of 231 and 51 healthy controls out of 54), we examined IBD-related genes in conjunction with CRC-related genes to gain a better insight into suspected genes. A Python implementation of our pipeline can be accessed publicly at https://github.com/AriaSar/CRCIBD-ML.

Item Type: Thesis (Masters)
URI: http://research.library.mun.ca/id/eprint/16564
Item ID: 16564
Additional Information: Includes bibliographical references (pages 44-63)
Keywords: machine learning, AI, colorectal cancer, CRC, inflammatory bowel disease
Department(s): Science, Faculty of > Computer Science
Date: June 2024
Date Type: Submission
Digital Object Identifier (DOI): https://doi.org/10.48336/1Y7A-F665
Library of Congress Subject Heading: Machine learning; Artificial intelligence; Colon (Anatomy)--Cancer; Rectum--Cancer; Inflammatory bowel diseases; Genetic markers; Bioinformatics

Actions (login required)

View Item View Item