Brobbey, Anita (2015) Variable selection in multivariate multiple regression. Masters thesis, Memorial University of Newfoundland.
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
Multivariate analysis is a common statistical tool for assessing covariate effects when only one response or multiple response variables of the same type are collected in experimental studies. However with mixed continuous and discrete outcomes, traditional modeling approaches are no longer appropriate. The common approach used to make inference is to model each outcome separately ignoring the potential correlation among the responses. However a statistical analysis that incorporates association may result in improved precision. Coffey and Gennings (2007a) proposed an extension of the generalized estimating equations (GEE) methodology to simultaneously analyze binary, count and continuous outcomes with nonlinear functions. Variable selection plays a pivotal role in modeling correlated responses due to large number of covariate variables involved. Thus a parsimonious model is always desirable to enhance model predictability and interpretation. To perform parameter estimation and variable selection simultaneously in the presence of mixed discrete and continuous outcomes, we propose a penalized based approach of the extended generalized estimating equations. This approach only require to specify the first two marginal moments and a working correlation structure. An advantageous feature of the penalized GEE is that the consistency of the model holds even if the working correlation is misspecified. However it is important to use appropriate working correlation structure in small samples since it improves the statistical efficiency of the regression parameters. We develop a computational algorithm for estimating the parameters using local quadratic approximation (LQA) algorithm proposed by Fan and Li (2001). For tuning parameter selection, we explore the performance of unweighted Bayesian information criterion(BIC) and generalized cross validation (GCV) for least absolute shrinkage and selection operator(LASSO) and smoothly clipped absolute deviation (SCAD). We discuss the asymptotic properties for the penalized GEE estimator when the number of subjects n goes to infinity. Our simulation studies reveal that when correlated mixed outcomes are available, estimates of regression parameters are unbiased regardless of the choice of correlation structure. However, estimates obtained from the unstructured working correlation (UWC) have reduced standard errors. SCAD with BIC tuning criteria works well in selecting important variables. Our approach is applied to concrete slump test data set.
|Item Type:||Thesis (Masters)|
|Additional Information:||Includes bibliographical references (pages 60-64).|
|Keywords:||Variable Selection, Multiple Multivariate Regression, Penalized Likelihood, Regularization Methods, Generalized Estimating Equations, Information Criterion|
|Department(s):||Science, Faculty of > Mathematics and Statistics|
|Library of Congress Subject Heading:||Multivariate analysis; Generalized estimating equations; Approximation algorithms; Analysis of covariance|
Actions (login required)