Date of Award

Fall 2024

Degree Type

Restricted to Claremont Colleges Dissertation

Degree Name

Economics, PhD

Program

School of Social Science, Politics, and Evaluation

Advisor/Supervisor/Committee Chair

Thomas Willett

Dissertation or Thesis Committee Member

Graham Bird

Dissertation or Thesis Committee Member

Levan Efremidze

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Rights Information

© 2024 Guan Wang

Subject Categories

Economics

Abstract

This dissertation investigates the predictive factors influencing loan default in the context of peer-to-peer (P2P) lending, with a particular focus on the integration of voluntarily provided text data alongside traditional financial, demographic, and loan information. Using a dataset of over 296,000 borrowers from the Lending Club platform, this research employs logistic regression and ensemble machine learning algorithms, including forward and backward stepwise selection and random forests, to rank the importance of various factors in predicting loan default.

By including text information, this paper improves the accuracy of predicting the relationship between a borrower's ability to repay a loan and the decision to grant a loan by 5%, compared to 60% accuracy without the inclusion of text information.

The analysis reveals that traditional financial variables such as interest rate, loan term, and debt-to-income ratio are the most significant predictors of default risk. However, text variables, especially those reflecting sentiment and psychological states—such as discrepancy, positive emotion, and affective processes—also play a critical role. Borrowers who express optimism or reference moral or emotional factors tend to have lower default rates, while those exhibiting financial discrepancies or negative emotions are more likely to default.

This research contributes to the literature by integrating natural language processing (NLP) techniques, specifically the Linguistic Inquiry and Word Count (LIWC2015) tool, to quantify and analyze borrowers’ textual descriptions. The findings suggest that lenders can improve their risk assessment models by combining financial and non-financial data, particularly voluntary text information. The study also highlights the growing potential of machine learning and NLP in enhancing predictive models for credit default. Practical implications include more informed lending decisions and better resource allocation to minimize default risk.

ISBN

9798346878209

Share

COinS