Qi, Jianlong (2007) Gene ontology driven feature selection from microarray gene expression data. Masters thesis, Memorial University of Newfoundland.
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
Structural and functional data from analysis of the human genome has increased many fold in recent years, presenting enormous opportunities and challenges for machine learning. In particular, gene expression microarrays are a rapidly maturing technology that provides the opportunity to assay the expression levels of thousands or tens of thousands of genes in a single experiment. -- In the analysis of microarray gene expression data, one of the main challenges is the small sample size compared with the large number of genes. Among these thousands of genes, only a small number of genes are relevant. To cope with this issue, feature selection, which is the process of removing features not relevant to the labeling, is an essential step in the analysis of microarray data. In this thesis, we present work in this area. -- In literature, most of the feature selection methods are solely based on gene expression values. However, due to the intrinsic limitations of microarray technology and a small number of samples, some expression levels may not be accurately measured or they are not a good estimation of the underlying distribution. This can reduce the effectiveness of feature selection. To resolve this deficiency, we explore the possibility of integrating Gene Ontology (GO) into feature selection in this work. GO represents a controlled biological vocabulary and a repository of computable biological knowledge. (Details will be introduced in the subsequent sections.) -- The main contributions of this thesis are the following: (1) a statistical assessment of the capability of GO based similarity (semantic similarity) in catching redundancy, and a new similarity measure that takes into account both expression similarity and semantic similarity, and (2) a method to incorporate GO annotation in the discriminative power of genes, which evaluates genes based on not only their individual discriminative powers but also the powers of GO terms annotating them.
|Item Type:||Thesis (Masters)|
|Additional Information:||Includes bibliographical references (leaves 83-91).|
|Department(s):||Science, Faculty of > Computer Science|
|Library of Congress Subject Heading:||Data mining; DNA microarrays--Statistical methods; Gene expression--Statistical methods; Genetics, Experimental--Statistical methods.|
Actions (login required)