Dunham, Michael W. (2022) Semisupervised machine learning algorithms and their application to geoscience classification problems. Doctoral (PhD) thesis, Memorial University of Newfoundland.
[English]
PDF
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. Download (166MB) |
Abstract
In recent years, many disciplines have been challenged with trying to extract valuable information from large datasets efficiently. Technological advances have improved data storage capabilities and how data can be obtained (e.g., real-time data). Manually interpreting data that are exponentially growing in volume has obvious management and analysis challenges. Machine learning algorithms recognize patterns in data and assign repetitive patterns to similar categories. This process automates pattern recognition in data and allows meaningful information to be extracted in an efficient manner. For many machine learning problems, there are sufficient labeled data to train a wide range of algorithms. Some applications, such as image classification and speech recognition, have large labeled datasets readily available. However, in several geoscience-related problems, labeled data are generally obtained by sampling the Earth in some manner (e.g., drilling wells, field sampling, etc.), which is not trivial due to cost and logistical factors. As such, many earth science-related machine learning problems have limited labeled data. Supervised machine learning algorithms are prone to overfitting when labeled data are scarce, but semisupervised approaches are designed for these problems because unlabeled data are also used to inform the learning process. Three geoscience applications inherently challenged with limited training data are well-log classification, seismic classification, and bedrock-lithology mapping. I apply various semisupervised algorithms to these three geoscience problems and determine if semisupervised algorithms can perform better than supervised methods and under what conditions. The semisupervised methods that I consider are self-training, label propagation, and semisupervised Gaussian mixture models. I consider several supervised methods in my work, but the most prevalent are the gradient boosting decision tree methods. The results demonstrate that semisupervised methods can outperform their supervised counterparts for each of the geoscience applications, but not in all situations. Nonetheless, semisupervised methods are rarely considered for many geoscience disciplines, which is demonstrated by the lack of published examples in the literature. The outcomes of this work are raising the awareness of semisupervised methods by showing their applicability to different geoscience problems and making recommendations on how and when to use these tools.
Item Type: | Thesis (Doctoral (PhD)) |
---|---|
URI: | http://research.library.mun.ca/id/eprint/15612 |
Item ID: | 15612 |
Additional Information: | Includes bibliographical references (pages 190-212) |
Keywords: | machine learning, semisupervised, geoscience, label propagation, hyper-parameter tuning |
Department(s): | Science, Faculty of > Earth Sciences |
Date: | October 2022 |
Date Type: | Submission |
Digital Object Identifier (DOI): | https://doi.org/10.48336/7QEN-D488 |
Library of Congress Subject Heading: | Machine learning; Geology; Supervised learning (Machine learning); Algorithms |
Actions (login required)
View Item |