Siu, Gastonguay (2025) Decoding DNA methylation: insights into age-related diseases and transcription factor dynamics. Masters thesis, Memorial University of Newfoundland.
![]() |
[English]
PDF
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. Download (6MB) |
Abstract
This thesis investigates the role of DNA methylation in age-related diseases by improving patient classification and exploring transcription factor interactions in the context of methylation. We developed an enhanced k-means clustering algorithm integrated with an expectation-maximization (EM) framework to filter irrelevant CpG sites and reduce noise in patient classification. The algorithm’s performance is benchmarked against traditional k-means clustering to assess its efficacy in stratifying patients and gain meaningful insights. Our method improved patient stratification based on survival outcomes by applying to Kidney Renal Papillary Cell Carcinoma (KIRP) data. It revealed hypermethylated profiles resembling the CpG Island Methylator Phenotype (CIMP), which is associated with poor prognosis. Clinical features better represent the population, and tumour stages are more realistically linked to survival outcomes after grouping using the EM k-means algorithm. An analysis of the Fraction Genome Altered (FGA) and mutation rates suggests that post-cancer development survival rates appear more closely tied to large-scale genomic instabilities induced by aberrant methylation than point mutations. Gene Ontology (GO) and KEGG pathway analyses identified critical pathways in cell adhesion, signal transduction, and BMP signalling influencing tumour behaviour and metastasis. With slight modifications, the EM k-means algorithm was used to predict patient survival based on the methylation patterns. Although there was a large difference between the training and testing populations due to sampling variability, the results were promising, indicating a potential as a diagnostic method for treatment plans. Extending our approach to Alzheimer's disease (AD), we uncovered genes associated with differentially methylated regions (GADMR) in AD patients not identifiable through traditional Braak staging or simple k-means clustering. The algorithm classified patients into pseudo-intermediate and pseudo-advanced groups, with significant overlaps in known AD-associated genes and pathways. An age analysis demonstrated that machine learning-classified patients exhibited increased chronological and genetic ages (DNA methylation aberrations) correlated with AD progression risk. Furthermore, we explored the molecular interplay between transcription factors Ying-Yang 1 (YY1) and TATA-Binding Factor 1 (TAF1). Employing TFregulomeR for motif distribution, genomic location, and Gene Ontology (GO) analysis on the YY1-TAF1 pair, we investigated methylation changes by comparing three scenarios: both TFs present together, and each TF alone by excluding the partner's peaks. From 152 conditions showing significant methylation variations, we focused on the YY1-TAF1 pair in GM12878, H1-hESC, and SK-N-SH cell lines. We found that TAF1 will not co-bind to YY1 when the YY1 binding motif is methylated at the third residue (cytosine or guanine) or when methylation is impossible due to the third residue not supporting it. Although the motif for co-binding peaks is cell-specific, stronger cytosine conservation at the third residue is observed where YY1-TAF1 co-binding occurs. Additionally, TAF1 binding to YY1 depends on an unmethylated state at this site. Our GO analysis reveals that the co-binding of YY1 and TAF1 expands the set of GO terms compared to YY1 alone, indicating a synergistic effect in regulating cellular processes. The extent of YY1-TAF1 co-binding at promoter-TSS sites varies by cell type, being more extensive in cells capable of differentiation. Specifically, co-binding in GM12878 cells correlates with functions related to protein synthesis and RNA processing, while in H1-hESC and SK-N-SH cells, it associates with a broader range of enrichments. Conversely, TAF1 alone shows the opposite pattern, suggesting that cellular needs and differentiation potential influence binding patterns. In conclusion, this thesis presents a novel computational framework for improving patient classification based on DNA methylation patterns. It sheds light on the molecular mechanisms of transcription factors in the context of methylation specificity. The enhanced EM k-means algorithm demonstrates potential as a diagnostic tool for personalized treatment plans, while the insights into YY1 and TAF1 interactions contribute to understanding gene regulation in disease progression. These findings have significant implications for developing targeted therapies and highlight the importance of methylation dynamics in age-related disease mechanisms.
Item Type: | Thesis (Masters) |
---|---|
URI: | http://research.library.mun.ca/id/eprint/16845 |
Item ID: | 16845 |
Additional Information: | Includes bibliographical references (pages 188-195) |
Keywords: | methylation, transcription factor, machine learning, kidney renal papillary cell carcinoma, Ying-Yang 1 |
Department(s): | Medicine, Faculty of > Biomedical Sciences |
Date: | February 2025 |
Date Type: | Submission |
Medical Subject Heading: | DNA Methylation; Transcription Factors; Carcinoma, Renal Cell; Machine Learning; Aging |
Actions (login required)
![]() |
View Item |