Deep learning for genome-wide association studies and the impact of SNP locations

Ji, Songyuan (2019) Deep learning for genome-wide association studies and the impact of SNP locations. Masters thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (3MB)


The study of Single Nucleotide Polymorphisms (SNPs) associated with human diseases is important for identifying pathogenic genetic variants and illuminating the genetic architecture of complex diseases. A Genome-wide association study (GWAS) examines genetic variation in different individuals and detects disease related SNPs. The traditional machine learning methods always use SNPs data as a sequence to analyze and process and thus may overlook the complex interacting relationships among multiple genetic factors. In this thesis, we propose a new hybrid deep learning approach to identify susceptibility SNPs associated with colorectal cancer. A set of SNPs variants were first selected by a hybrid feature selection algorithm, and then organized as 3D images using a selection of space-filling curve models. A multi-layer deep Convolutional Neural Network was constructed and trained using those images. We found that images generated using the space-filling curve model that preserve the original SNP locations in the genome yield the best classification performance. We also report a set of high risk SNPs associate with colorectal cancer as the result of the deep neural network model.

Item Type: Thesis (Masters)
Item ID: 14272
Additional Information: Includes bibliographical references (pages 88-99).
Keywords: Computer Vision, Artificial Intelligence, Deep Learning, Bioinformatics, Data visualization, Space-filling curve, Hilbert curve, TensorFlow
Department(s): Science, Faculty of > Computer Science
Date: September 2019
Date Type: Submission
Library of Congress Subject Heading: Machine learning; Single nucleotide polymorphisms.

Actions (login required)

View Item View Item


Downloads per month over the past year

View more statistics