Date of Award
Fall 2024
Degree Type
Restricted to Claremont Colleges Dissertation
Degree Name
Economics, PhD
School of Social Science, Politics, and Evaluation
Advisor/Supervisor/Committee Chair
Thomas Willett
Dissertation or Thesis Committee Member
Graham Bird
Dissertation or Thesis Committee Member
Levan Efremidze
Terms of Use & License Information
Rights Information
© 2024 Guan Wang
Subject Categories
This dissertation investigates the predictive factors influencing loan default in the context of peer-to-peer (P2P) lending, with a particular focus on the integration of voluntarily provided text data alongside traditional financial, demographic, and loan information. Using a dataset of over 296,000 borrowers from the Lending Club platform, this research employs logistic regression and ensemble machine learning algorithms, including forward and backward stepwise selection and random forests, to rank the importance of various factors in predicting loan default.
By including text information, this paper improves the accuracy of predicting the relationship between a borrower's ability to repay a loan and the decision to grant a loan by 5%, compared to 60% accuracy without the inclusion of text information.
The analysis reveals that traditional financial variables such as interest rate, loan term, and debt-to-income ratio are the most significant predictors of default risk. However, text variables, especially those reflecting sentiment and psychological states—such as discrepancy, positive emotion, and affective processes—also play a critical role. Borrowers who express optimism or reference moral or emotional factors tend to have lower default rates, while those exhibiting financial discrepancies or negative emotions are more likely to default.
This research contributes to the literature by integrating natural language processing (NLP) techniques, specifically the Linguistic Inquiry and Word Count (LIWC2015) tool, to quantify and analyze borrowers’ textual descriptions. The findings suggest that lenders can improve their risk assessment models by combining financial and non-financial data, particularly voluntary text information. The study also highlights the growing potential of machine learning and NLP in enhancing predictive models for credit default. Practical implications include more informed lending decisions and better resource allocation to minimize default risk.
Recommended Citation
Wang, Guan. (2024). What Text Information Helps to Reduce Default Risk. CGU Theses & Dissertations, 891.