TopAffy: predicting transcription factors DNA-binding specificities using a general topological method

Zier-Vogel, Ryan (2021) TopAffy: predicting transcription factors DNA-binding specificities using a general topological method. Doctoral (PhD) thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (3MB)


Transcription factors (TFs) recognize and bind to specific DNA sequences. Knowing the binding specificity of TFs is crucial to understand gene regulation and how genetic differences in the DNA sequence of TF binding sites affect TF DNA binding activity. However, the transcription factor binding preferences of only 1% of all eukaryotic TFs are known. Computational prediction of TF binding preferences is an affordable and efficient way to increase the number of known binding preferences. Most bioinformatic tools for predicting the binding preferences of TFs require as input the binding preferences of related TFs. However, there are TF families for which very little experimental data is available. In this work, we present TopAffy, a new approach for predicting TF 8-mer binding profiles. TopAffy constructs a stochastic topological representation of DNA-binding domain sequences and learns a numerical representation of the binding preferences of neighbouring amino acid pairs. TopAffy's main contribution is to construct a family-independent model which can be used to predict the 8-mer binding profile for TF families for which no experimental data is yet available. TopAffy's predictive performance is comparable to the performance of state-of-the-art family-specific approaches. Our results demonstrate that it is possible to learn a general model of binding specificities suitable for predicting binding preferences for a number of TF families.

Item Type: Thesis (Doctoral (PhD))
Item ID: 15252
Additional Information: Includes bibliographical references (pages 65-75).
Keywords: DNA-binding specificities
Department(s): Science, Faculty of > Computer Science
Date: September 2021
Date Type: Submission
Digital Object Identifier (DOI):
Library of Congress Subject Heading: DNA; Transcription factors; DNA-binding proteins; Computer science; Genetic regulation; Stochastic processes.

Actions (login required)

View Item View Item


Downloads per month over the past year

View more statistics