Document Type
Article
Publication Date
2019
Abstract
Stepwise regression is a popular data-mining tool that uses statistical significance to select the explanatory variables to be used in a multiple-regression model. A fundamental problem with stepwise regression is that some real explanatory variables that have causal effects on the dependent variable may happen to not be statistically significant, while nuisance variables may be coincidentally significant. As a result, the model may fit the data well in-sample, but do poorly out-of-sample. Many Big-Data researchers believe that, the larger the number of possible explanatory variables, the more useful is stepwise regression for selecting explanatory variables. The reality is that stepwise regression is less effective the larger the number of potential explanatory variables. Stepwise regression does not solve the Big-Data problem of too many explanatory variables. Big Data exacerbates the failings of stepwise regression.
Terms of Use & License Information
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Smith, Gary N., "Step Away From Stepwise" (2019). Pomona Economics. 13.
https://scholarship.claremont.edu/pomona_fac_econ/13
Comments
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-018-0143-6.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.