Combinatorics and its applications in DNA analysis

Go, Seung-byong Light (2009) Combinatorics and its applications in DNA analysis. Masters thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (7MB)


There are several aspects of research in DNA analysis. This thesis is an exploration of four different areas of DNA analysis that use Combinatorics and its applications. First, Levenshtein introduced the idea of Levenshtein Distance. For two strings, Levenshtein Distance is the number of operations (insertions, deletions and substitutions) required to transform one string into the other. An application of Levenshtein Distance includes creation of large sets of synthetic tissue identification that provided error detection and correction. The second area of DNA analysis using Combinatorics is the application of Graph Theory. Two methods of sequencing technique, fragmentation (overlap) method and sequencing by hybridization, both of which use Graph Theory, are studied. The third area of DNA analysis that we study is sequence comparison. Dynamic programming is used to effectively pair up two sequences. A heuristic method of searching sequence alignment such as FASTA is discussed. The final area of DNA analysis studied is the efficient selection of unique oligonucleotide (oligo) from a database containing large DNA or protein sequences. With the large size of database, an effective approach to find unique oligos is required. In this thesis, the Brute-Force method and the filtration methods for the selection of unique oligos, and parallelization of these methods to save some time in searching for unique oligos, are studied. The Brute-Force and filtration methods give us accurate results but they may take a long time. We attempt a new approach, which gives us less accurate results over much improved searching time.

Item Type: Thesis (Masters)
Item ID: 9703
Additional Information: Includes bibliographical references (leaves 118-126)
Department(s): Science, Faculty of > Mathematics and Statistics
Date: 2009
Date Type: Submission
Library of Congress Subject Heading: Amino acid sequence--Databases; Combinatorial analysis; DNA fingerprinting--Mathematical models

Actions (login required)

View Item View Item


Downloads per month over the past year

View more statistics