BacTermFinder: bacteria-agnostic comprehensive terminator finder using a CNN ensemble

Taheri Ghahfarokhi, Seyed Mohammad Amin (2024) BacTermFinder: bacteria-agnostic comprehensive terminator finder using a CNN ensemble. Masters thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (14MB)

Abstract

Terminator is a region in the DNA that ends the transcription process. Knowing the location of bacterial terminators will lead to a better understanding of how bacteria’s transcription works. This might facilitate bio-engineering and support bacterial genomic studies. Currently, multiple tools are available for predicting bacterial terminators. However, most methods are specialized for certain bacteria or terminator types. In this work, we developed BacTermFinder, a tool that utilized Deep Learning models, specifically an ensemble of Convolutional Neural Networks (CNNs), with four different genomic representations trained on 46,386 bacterial terminators identified using RNA-seq technologies. Based on our results, BacTermFinder’s average recall score is significantly higher than the next best approach (0.56 ± 0.19 vs 0.45 ± 0.20) in our diverse test set of five different bacteria while reducing the number of false positives. Moreover, BacTermFinder’s model identifies both types of terminators (intrinsic and factor-dependent) and even generalizes to Archea. BacTermFinder is publicly available at https://github.com/BioinformaticsLabAtMUN/BacTermFinder.

Item Type: Thesis (Masters)
URI: http://research.library.mun.ca/id/eprint/16453
Item ID: 16453
Additional Information: Includes bibliographical references (pages 67-85)
Keywords: bioinformatics, computational biology, bacterial terminator, deep learning, microbiology
Department(s): Science, Faculty of > Computer Science
Date: April 2024
Date Type: Submission
Library of Congress Subject Heading: Bioinformatics; Computational biology; Deep learning (Machine learning); Bacterial genetics

Actions (login required)

View Item View Item

Downloads

Downloads per month over the past year

View more statistics