Representing transcription factor dimers by using forked-position weight matrices

Khiavi, Aida Ghayour (2021) Representing transcription factor dimers by using forked-position weight matrices. Masters thesis, Memorial University of Newfoundland.

[English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
Download (5MB)

Abstract

Position Weight Matrices (PWMs) and sequence logos are one of the most popular tools among researchers for modelling and visualizing Transcription Factor (TF) Binding Sites (TFBS). The PWM based models predict a single DNA sequence as a reference TFBS for a specific TF, based on experimentally determined sequence information. One of the standard assays for characterizing the TFBS of one TF on a genomic-wide scale is called ChIP-Seq. The Chromatin Immunoprecipitation (ChIP) method uses TF-specific antibodies to capture protein:DNA complexes, followed by high-throughput sequencing of the bound DNA sequences (Seq). These experiments are applied in a controlled manner to target only one TF at each run, thus describing TFBSs of a single TF of interest. This approach is proven to be imprecise because many TFs (e.g. Leucine Zippers) tend to bind to the DNA as homodimers or heterodimers. Hence, the ChIP-seq assay will obtain the entire set of dimer complexes of a target TF (homodimers and heterodimers); and merge the captured information into a single PWM which subsequently will lead to an imprecise description of the TFBS. The TFBS constructed by the mixture of homodimers and heterodimers will result in a model with two halves: a conserved part (binding sites of the TF of interest) and a degenerated part (representing a mixture of the binding sites of TF’s partners). Current PWMs (or Sequence Logos) seem inadequate to represent TF dimer binding sites since they fail to represent the TF’s binding dynamic and disregard the alteration in sequence preference caused by different dimer partners of the given TF. To tackle this problem, we introduce an R library named Forked Position Weight Matrix (FPWM), which provides the user with variant functionalities to generate a more precise PWM that adapts to TF dimers by forking it into the co-factors of the main TF. The FPWM enhances TFBS prediction’s power and allows the biologists to have a more precise interpretation of cell context by providing a more expressive model of TFBSs. The FPWM is less susceptible to false-positives and is a more precise way to represent dimer TFBSs, which introduces a new standard in dimer and TFBSs analysis.

Item Type:	Thesis (Masters)
URI:	http://research.library.mun.ca/id/eprint/14942
Item ID:	14942
Additional Information:	Includes bibliographical references (pages 90-101).
Keywords:	Bioinformatics, Transcription Factors, Position Weight Matrix
Department(s):	Science, Faculty of > Computer Science
Date:	May 2021
Date Type:	Submission
Digital Object Identifier (DOI):	https://doi.org/10.48336/tfyp-ej37
Library of Congress Subject Heading:	Nucleotide sequence--Simulation method; Genetic regulation.

Actions (login required)

View Item

Download statistics

Downloads

Downloads per month over the past year

View more statistics