Researcher ORCID Identifier
0009-0002-3579-5862
Graduation Year
2026
Date of Submission
4-2026
Document Type
Campus Only Senior Thesis
Degree Name
Bachelor of Arts
Department
Mathematics
Reader 1
Mark Huber
Terms of Use & License Information
Rights Information
© 2026 Ryan S. Shakiba
Abstract
This thesis aims to identify if public Call Report data can be used to predict which U.S. banks will fail in the next four quarters, what models are best at predicting accurately, and whether predictive models can be both useful and easy to read. I built a bank-quarter panel from FFIEC 041 and 051 filings pulled through Wharton Research Data Services (WRDS), covering 632,764 observations across 10,740 institutions from 2001 through 2024, and used the FDIC Failed Bank List to label failures. I trained three econometric models (logit, ridge, and lasso) and two machine-learning benchmarks (random forest and XGBoost) on 2001--2010 data, and tested them out of sample on two fixed-origin windows: 2011--2019 with a 25-variable specification, and 2011--2023 with a 23-variable specification that drops two risk-weighted capital ratios whose reporting was disrupted by the Community Bank Leverage Ratio after 2020. In both specifications, models built only from public regulatory variables get ROC AUCs above 0.97 and place more than half of all pre-failure bank-quarters inside the riskiest 0.25% of observations, with screening lifts over 200 times the baseline failure rate. Logit is the best interpretable model, random forest is the best overall, XGBoost is strong on ranking but harder to read, and lasso does poorly. The lasso result is informative because it reveals that failure risk is not driven by one or two ratios but by weakness showing up across capital, credit concentration, profitability, and funding at the same time. Across the interpretable logit results and random-forest importance rankings, failure risk is most consistently tied to weaker capital and profitability, lower liquid/securities buffers, greater CRE and C&I concentration, higher allowance ratios, and more brokered or costly funding. Policy takeaways include that simple parametric models built from public data still can be effective at early-warning signaling, and the small gains from machine learning algorithms do not make "readable'' logistic regression methods irrelevant.
Recommended Citation
Shakiba, Ryan, "Explaining and Predicting U.S. Bank Failures, 2001--2024: An Interpretable Econometric and Machine-Learning Approach" (2026). CMC Senior Theses. 4080.
https://scholarship.claremont.edu/cmc_theses/4080
This thesis is restricted to the Claremont Colleges current faculty, students, and staff.