Date of Award

Fall 2024

Degree Type

Restricted to Claremont Colleges Dissertation

Degree Name

Economics, PhD

Program

School of Social Science, Politics, and Evaluation

Advisor/Supervisor/Committee Chair

Thomas Willett

Dissertation or Thesis Committee Member

Graham Bird

Dissertation or Thesis Committee Member

Levan Efremidze

Terms of Use & License Information

Rights Information

Subject Categories

Economics

Abstract

This dissertation investigates the predictive factors influencing loan default in the context of peer-to-peer (P2P) lending, with a particular focus on the integration of voluntarily provided text data alongside traditional financial, demographic, and loan information. Using a dataset of over 296,000 borrowers from the Lending Club platform, this research employs logistic regression and ensemble machine learning algorithms, including forward and backward stepwise selection and random forests, to rank the importance of various factors in predicting loan default.

By including text information, this paper improves the accuracy of predicting the relationship between a borrower's ability to repay a loan and the decision to grant a loan by 5%, compared to 60% accuracy without the inclusion of text information.

The analysis reveals that traditional financial variables such as interest rate, loan term, and debt-to-income ratio are the most significant predictors of default risk. However, text variables, especially those reflecting sentiment and psychological states—such as discrepancy, positive emotion, and affective processes—also play a critical role. Borrowers who express optimism or reference moral or emotional factors tend to have lower default rates, while those exhibiting financial discrepancies or negative emotions are more likely to default.

This research contributes to the literature by integrating natural language processing (NLP) techniques, specifically the Linguistic Inquiry and Word Count (LIWC2015) tool, to quantify and analyze borrowers’ textual descriptions. The findings suggest that lenders can improve their risk assessment models by combining financial and non-financial data, particularly voluntary text information. The study also highlights the growing potential of machine learning and NLP in enhancing predictive models for credit default. Practical implications include more informed lending decisions and better resource allocation to minimize default risk.

ISBN

9798346878209

Recommended Citation

Wang, Guan. (2024). What Text Information Helps to Reduce Default Risk. CGU Theses & Dissertations, 891. https://scholarship.claremont.edu/cgu_etd/891.

Download

COinS

CGU Theses & Dissertations

What Text Information Helps to Reduce Default Risk

Date of Award

Degree Type

Degree Name

Program

Advisor/Supervisor/Committee Chair

Dissertation or Thesis Committee Member

Dissertation or Thesis Committee Member

Terms of Use & License Information

Rights Information

Subject Categories

Abstract

ISBN

Recommended Citation

Search

Browse

Author Corner

Useful Links

CGU Theses & Dissertations

What Text Information Helps to Reduce Default Risk

Author

Date of Award

Degree Type

Degree Name

Program

Advisor/Supervisor/Committee Chair

Dissertation or Thesis Committee Member

Dissertation or Thesis Committee Member

Terms of Use & License Information

Rights Information

Subject Categories

Abstract

ISBN

Recommended Citation

Share

Search

Browse

Author Corner

Useful Links