Ji, Yunqi (2011) Analysis of longitudinal categorical and count data subject to measurement error. Doctoral (PhD) thesis, Memorial University of Newfoundland.
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
In biomedical, social, behavioral, and environmental studies, the data are frequently collected from surveys, registration systems, clinical trials, and other observational or experimental studies, which are often contaminated with measurement errors. This may be due to the imperfect instruments and procedures, limited experience and knowledge of examiners and examinees. Ignoring measurement errors in responses results in biased estimates of model parameters. Explicit models are required to describe the misclassifications on categorical responses and count errors on aggregation responses. To obtain more reliable inference, one needs to take the measurement errors into consideration when developing statistical methods to analyze mis-measured data. -- In this thesis, we define a generalized thinning operation, based on which we propose a transition model for categorical longitudinal data. This new transition model can flexibly accommodate a variety of linear and nonlinear transition models. We also discuss a thinning-operation-based transition model and an ordinary linear transition model for dynamic count data. -- Most importantly, we present some new measurement error models for categorical data and count data, which link the true responses with the observed, possibly mis-measured responses by explicit expressions. A meaningful application of the explicit misclassification model is to describe the unbalanced misclassifications in categorical data, which provides an alternative way to jointly model the data suffering from both misclassification and some missing values due to unsure answers. Moreover, the count error models which accommodate both the overcounted and undercounted data can be used to describe some interesting count data of disease cases with different situations of the dynamic population sizes of an area. We apply these explicit measurement error models and transition models to analyze the longitudinal discrete data subject to measurement errors. -- Methods based on the generalized estimating equations (GEE), generalized quasi-likelihood (GQL), the second order GQL (GQL2), and maximum likelihood (ML) are developed to obtain unbiased hence consistent estimates of the unknown parameters in longitudinal models for categorical and count responses. The explicit measurement error models lead to simple development of the GEE, GQL and GQL2 approaches. Intensive simulations are conducted to examine the performance of these approaches. These methods tend to provide satisfactory estimates of model parameters, estimated standard errors and confidence intervals. Surprisingly the generalized quasi-likelihood approach performs almost as good as the likelihood approach when the latter is applicable in some first-order transition models. In the linear transition model for dynamic count data, even the GQL approach provide almost as good estimates as the ML approach. These findings provide us an efficient alternative to analyze longitudinal data when complicated dependence structure is taken into account the modeling. The proposed methods are illustrated by an example of children asthma data from Harvard Six Cities Study.
|Item Type:||Thesis (Doctoral (PhD))|
|Additional Information:||Bibliography: leaves 200-211.|
|Department(s):||Science, Faculty of > Mathematics and Statistics|
|Library of Congress Subject Heading:||Linear models (Statistics); Longitudinal method; Error analysis (Mathematics)|
Actions (login required)