Kang, Qiao (2024) Causal inference and interpretable machine learning for multiscale environmental data analysis. Doctoral (PhD) thesis, Memorial University of Newfoundland.
Full text not available from this repository.
Abstract
Environmental data analysis encompasses methods including domain specific environmental modelling, statistics, and data-driven methods (e.g., artificial intelligence) to interpret observational and experimental datasets for tackling environmental issues. The field of environmental data analysis has experienced significant advancement over the last decade, fueled by the exponential increase in data quantity and complexity and the progression of data-driven paradigms alongside artificial intelligence. This field faces several key challenges, including 1) the lack of means for analysis from causal perspectives, especially in complicated multivariable problems, 2) the relatively high computational cost associated with partial-differential-equation-based models that incorporate physics priors, compounded by intrinsic uncertainties in parameter tuning processes, and 3) the frequent situations with limited data available or valid for analysis. This dissertation research aims to bridge the gaps by developing a set of integrated methods that meld the strengths of interpretable machine learning and causal inference with classic tools for environmental data analysis and modelling. It entails the following major tasks: 1) to introduce an interpretable data analysis framework that leverages machine learning and causal inference. This framework can not only promote a deeper understanding of the causal relationships within environmental data but also serve as a testament to the value and potential of applying interpretative analytics in environmental fields. It is exhibited by a case study on the relationships between environmental factors and pandemic severity. 2) to develop a causal-prior embedded neural network, utilizing experimental data and parameters fitted from physics-based models, offering a systematic integration of lab experiments, physics-based simulation, causal inference techniques, and neural network iii modelling. The method is demonstrated through an integrated experimental and modelling study on the fate and transport of metformin, an emerging contaminant, in a porous medium. 3) To propose and test a transfer learning-based method to estimate the occurrences of environmental pollutants released or closely associated with human activities under data-scarce scenarios, supported by a novel neural network architecture and a comprehensive model fine-tuning strategy. The method is exemplified through a global risk assessment of metformin with a special attention on Canadian ecozones and the Arctic and sub-Arctic regions to showcase the method’s effectiveness in enhancing environmental risk evaluation in data-limited contexts. The dissertation research advances the field of environmental data analysis by developing a set of new methodologies based on causal inference and interpretable machine learning. Those methods deliver benefits including enhanced model interpretability, reduced computational costs, and improved efficiency in dataset utilization, enabling robust analysis of environmental data across diverse scales. The research can offer not only robust and effect methodologies for actionable environmental data analysis and modelling but also enhance our capability to harness vast and complex environmental data for informed decision-making and policy development.
Item Type: | Thesis (Doctoral (PhD)) |
---|---|
URI: | http://research.library.mun.ca/id/eprint/16524 |
Item ID: | 16524 |
Additional Information: | Includes bibliographical references (pages 162-197) -- Restricted until June 20, 2025 |
Keywords: | causal inference, interpretable machine learning, emerging pollutant, fate and transport, COVID-19 |
Department(s): | Engineering and Applied Science, Faculty of |
Date: | June 2024 |
Date Type: | Submission |
Library of Congress Subject Heading: | Environmental sciences--Data processing; Environmental monitoring--Data processing; Machine learning |
Actions (login required)
View Item |