Detecting operons from RNA-seq data using a convolutional and recurrent neural network architecture

Karaji, Rezvan (2023) Detecting operons from RNA-seq data using a convolutional and recurrent neural network architecture. Masters thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (3MB)


Operon is a characteristic of prokaryotic genomes that enables the co-regulation of adjacent genes. Identifying which genes belong to the same operon can help in understanding bacterial gene function and regulation, which can enhance, for instance, drug development and antibiotic resistance inhibition. There are numerous experimental and computational approaches for operon detection; however, many of the computational approaches have been developed for a specific target genome or require specific information only available for a restricted number of bacterial genomes. Here, we develop a novel general method that directly utilizes RNA-seq reads as a signal over nucleotide bases in the genome, extracting all the information from the RNA-seq data. This representation enabled us to employ deep learning techniques without limitations on species. The final model (OpDetect) demonstrates superior performance in terms of recall, f1-score and Area Under Receiver Operating Characteristic curve (AUROC) compared to previous approaches. Additionally, it showcases species-agnostic capabilities, successfully detecting operons even in Caenorhabditis elegans (C. elegans), the only eukaryotic organism known to have operons.

Item Type: Thesis (Masters)
Item ID: 16142
Additional Information: Includes bibliographical references (pages 60-70)
Keywords: operon, convolutional neural network, recurrent neural network, RNA-seq
Department(s): Science, Faculty of > Computer Science
Date: September 2023
Date Type: Submission
Digital Object Identifier (DOI):
Library of Congress Subject Heading: Operons; Nucleotide sequence; Convolutions (Mathematics); Neural networks (Computer science)

Actions (login required)

View Item View Item


Downloads per month over the past year

View more statistics