Survival Analysis with Gene Expression Arrays

Document Type

Book Chapter


Mathematics (Pomona)

Publication Date



gene expressions, arrays, survival


This chapter discusses the identification and measurement of gene expressions as prognostic indicators for survival times. As the technology and field of bioinformatics has rapidly exploded in recent years, so has the need for tools to analyze outcome data with covariates of extreme high dimension, such as arises from the measurement of expression levels from large numbers of genes. The chapter outlines a variety of data-reduction and analysis techniques for correlating survival times with high-dimensional covariates. It compares four methods for correlating high-dimensional gene expression levels with survival outcome in the context of Cox's proportional hazards model. Each method consists of an initial data reduction step and a secondary model-fitting step. In the case of a single influential gene, the stepwise model did consistently capture the gene in the optimal model, but unfortunately it included a number of false positive genes as well. Increasing the number of designated genes but dampening their correlation with survival led to a decrease in the true positive rate and little to no improvement in predictive performance as measured by the average cross-validated log likelihood.

Rights Information

© 2003 Elsevier B.V.

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.