CMC Senior Theses

Explaining and Predicting U.S. Bank Failures, 2001--2024: An Interpretable Econometric and Machine-Learning Approach

Ryan ShakibaFollow

Researcher ORCID Identifier

0009-0002-3579-5862

Graduation Year

2026

Date of Submission

4-2026

Document Type

Campus Only Senior Thesis

Degree Name

Bachelor of Arts

Department

Mathematics

Reader 1

Mark Huber

Terms of Use & License Information

Rights Information

Abstract

This thesis aims to identify if public Call Report data can be used to predict which U.S. banks will fail in the next four quarters, what models are best at predicting accurately, and whether predictive models can be both useful and easy to read. I built a bank-quarter panel from FFIEC 041 and 051 filings pulled through Wharton Research Data Services (WRDS), covering 632,764 observations across 10,740 institutions from 2001 through 2024, and used the FDIC Failed Bank List to label failures. I trained three econometric models (logit, ridge, and lasso) and two machine-learning benchmarks (random forest and XGBoost) on 2001--2010 data, and tested them out of sample on two fixed-origin windows: 2011--2019 with a 25-variable specification, and 2011--2023 with a 23-variable specification that drops two risk-weighted capital ratios whose reporting was disrupted by the Community Bank Leverage Ratio after 2020. In both specifications, models built only from public regulatory variables get ROC AUCs above 0.97 and place more than half of all pre-failure bank-quarters inside the riskiest 0.25% of observations, with screening lifts over 200 times the baseline failure rate. Logit is the best interpretable model, random forest is the best overall, XGBoost is strong on ranking but harder to read, and lasso does poorly. The lasso result is informative because it reveals that failure risk is not driven by one or two ratios but by weakness showing up across capital, credit concentration, profitability, and funding at the same time. Across the interpretable logit results and random-forest importance rankings, failure risk is most consistently tied to weaker capital and profitability, lower liquid/securities buffers, greater CRE and C&I concentration, higher allowance ratios, and more brokered or costly funding. Policy takeaways include that simple parametric models built from public data still can be effective at early-warning signaling, and the small gains from machine learning algorithms do not make "readable'' logistic regression methods irrelevant.

Recommended Citation

Shakiba, Ryan, "Explaining and Predicting U.S. Bank Failures, 2001--2024: An Interpretable Econometric and Machine-Learning Approach" (2026). CMC Senior Theses. 4080.
https://scholarship.claremont.edu/cmc_theses/4080

Download

This thesis is restricted to the Claremont Colleges current faculty, students, and staff.

COinS

CMC Senior Theses

Explaining and Predicting U.S. Bank Failures, 2001--2024: An Interpretable Econometric and Machine-Learning Approach

Researcher ORCID Identifier

Graduation Year

Date of Submission

Document Type

Degree Name

Department

Reader 1

Terms of Use & License Information

Rights Information

Abstract

Recommended Citation

Search

Browse

Author Corner

Useful Links

CMC Senior Theses

Explaining and Predicting U.S. Bank Failures, 2001--2024: An Interpretable Econometric and Machine-Learning Approach

Author

Researcher ORCID Identifier

Graduation Year

Date of Submission

Document Type

Degree Name

Department

Reader 1

Terms of Use & License Information

Rights Information

Abstract

Recommended Citation

Share

Search

Browse

Author Corner

Useful Links