Taheri Ghahfarokhi, Seyed Mohammad Amin (2024) BacTermFinder: bacteria-agnostic comprehensive terminator finder using a CNN ensemble. Masters thesis, Memorial University of Newfoundland.
[English]
PDF
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. Download (14MB) |
Abstract
Terminator is a region in the DNA that ends the transcription process. Knowing the location of bacterial terminators will lead to a better understanding of how bacteria’s transcription works. This might facilitate bio-engineering and support bacterial genomic studies. Currently, multiple tools are available for predicting bacterial terminators. However, most methods are specialized for certain bacteria or terminator types. In this work, we developed BacTermFinder, a tool that utilized Deep Learning models, specifically an ensemble of Convolutional Neural Networks (CNNs), with four different genomic representations trained on 46,386 bacterial terminators identified using RNA-seq technologies. Based on our results, BacTermFinder’s average recall score is significantly higher than the next best approach (0.56 ± 0.19 vs 0.45 ± 0.20) in our diverse test set of five different bacteria while reducing the number of false positives. Moreover, BacTermFinder’s model identifies both types of terminators (intrinsic and factor-dependent) and even generalizes to Archea. BacTermFinder is publicly available at https://github.com/BioinformaticsLabAtMUN/BacTermFinder.
Item Type: | Thesis (Masters) |
---|---|
URI: | http://research.library.mun.ca/id/eprint/16453 |
Item ID: | 16453 |
Additional Information: | Includes bibliographical references (pages 67-85) |
Keywords: | bioinformatics, computational biology, bacterial terminator, deep learning, microbiology |
Department(s): | Science, Faculty of > Computer Science |
Date: | April 2024 |
Date Type: | Submission |
Library of Congress Subject Heading: | Bioinformatics; Computational biology; Deep learning (Machine learning); Bacterial genetics |
Actions (login required)
View Item |