Survival Analysis with Gene Expression Arrays
gene expressions, arrays, survival
This chapter discusses the identification and measurement of gene expressions as prognostic indicators for survival times. As the technology and field of bioinformatics has rapidly exploded in recent years, so has the need for tools to analyze outcome data with covariates of extreme high dimension, such as arises from the measurement of expression levels from large numbers of genes. The chapter outlines a variety of data-reduction and analysis techniques for correlating survival times with high-dimensional covariates. It compares four methods for correlating high-dimensional gene expression levels with survival outcome in the context of Cox's proportional hazards model. Each method consists of an initial data reduction step and a secondary model-fitting step. In the case of a single influential gene, the stepwise model did consistently capture the gene in the optimal model, but unfortunately it included a number of false positive genes as well. Increasing the number of designated genes but dampening their correlation with survival led to a decrease in the true positive rate and little to no improvement in predictive performance as measured by the average cross-validated log likelihood.
© 2003 Elsevier B.V.
Donna K. Pauler, Johanna Hardin, James R. Faulkner, Michael LeBlanc, John J. Crowley, Survival Analysis with Gene Expression Arrays, In: N. Balakrishnan and C.R. Rao, Editor(s), Handbook of Statistics, Elsevier, 2003, Volume 23, Pages 675-688, ISSN 0169-7161, ISBN 9780444500793, http://dx.doi.org/10.1016/S0169-7161(03)23037-6. (http://www.sciencedirect.com/science/article/pii/S0169716103230376)