Development and evaluation of a robust and self-driven unsupervised data clustering algorithm

Hasan, Md Monjur Ul (2024) Development and evaluation of a robust and self-driven unsupervised data clustering algorithm. Doctoral (PhD) thesis, Memorial University of Newfoundland.

Full text not available from this repository.

Abstract

Data clustering is an important tool for analyzing and understanding data, particularly, if the data is large or contains many attributes. Data clustering is straightforward if a small set of rules can be devised to determine the clustering. Practically, this small set of rules is not possible for complex datasets. Various clustering algorithms have been found in the literature addressing different clustering challenges, such as partitioning, hierarchical, and machine learning methods. Most of the approaches require some prior knowledge about the clusters, such as the total number of clusters. Furthermore, some previous algorithms are not robust enough to process higher-dimensional data or require a large amount of memory for computations. In this thesis, we explore a number of clustering techniques, their advantages, and shortcomings; and devise a new clustering technique combining the benefits from existing algorithms, while making it robust and independent from requiring knowledge about the clusters. A data clustering algorithm, Piecemeal Clustering, is proposed in this thesis. The proposed algorithm clusters datasets in three steps combining the concepts of density distribution, agglomerative hierarchical clustering, and Self-Organizing Map (SOM) . Piecemeal Clustering can successfully cluster data without prior knowledge of the number of clusters. The proposed clustering algorithm uses the similarity and density of the data in n−dimensional hyperspace to identify the number of clusters in the dataset and works with both low- and high-dimensional data. The capability of the proposed algorithm is demonstrated with two test datasets: it clusters Iris flower data and identifies the letters from the cursive (handwritten) English alphabet. The algorithm shows positive results in both cases. According to the obtained result, the Piecemeal algorithm outperforms seven other state-of-the-art algorithms on both datasets: k−means, SOM, Hierarchical, DBSCAN, RNN-DBSCAN, HDBSCAN*, and Blocked DBSCAN. The algorithm is also applied to solve two real world problems. Both of the use cases are related to Oil and Gas Engineering applications. In the first use case, Piecemeal Clustering was used to identify the lithofacies of a potential oil field. It used well log data and applied Piecemeal Clustering to identify the number and location of unique lithofacies. In the second use case, the algorithm was used to identify drill bits blades from top-view images of damaged drill bits. In this scenario, the algorithm was used in conjunction with other algorithms. In both real-world case studies, the algorithm performed positively.

Item Type:	Thesis (Doctoral (PhD))
URI:	http://research.library.mun.ca/id/eprint/16834
Item ID:	16834
Additional Information:	Includes bibliographical references -- Restricted until November 30, 2025
Keywords:	data clustering, density-based clustering, unsupervised machine learning, lithofacies identification, piecemeal clustering
Department(s):	Engineering and Applied Science, Faculty of
Date:	November 2024
Date Type:	Submission
Digital Object Identifier (DOI):	https://doi.org/10.48336/404m-8e71
Library of Congress Subject Heading:	Cluster analysis; Cluster analysis--Computer programs; Data mining; Machine learning

Actions (login required)

View Item