Researcher ORCID Identifier
https://orcid.org/0000-0002-9081-0968
Graduation Year
2021
Date of Submission
5-2021
Document Type
Open Access Senior Thesis
Degree Name
Bachelor of Arts
Department
Economics
Reader 1
Nishant Dass
Reader 2
Mike Izbicki
Terms of Use & License Information
Rights Information
© 2021 Seungho (Samuel) Lee
Abstract
This paper attempts to quantify predictive power of social media sentiment and financial data in stock prediction by utilizing a comprehensive set of stock-related fundamental and technical variables and social media sentiments. For conducting sentiment analysis, this study employs a pretrained finBERT model that provides three different sentiment classifications and respective softmax scores. Hence, the significance of these variables is evaluated with XGBoost regression and Shapley Additive exPlanations (SHAP) frameworks. Through investigating feature importance, this study finds that statistical properties of sentiment variables provide a stronger predictive power than a weighted sentiment score and that it is possible to quantify the impact features make on so-called “black box” models.
Recommended Citation
Lee, Seungho (Samuel), "Feature Investigation for Stock Returns Prediction Using XGBoost and Deep Learning Sentiment Classification" (2021). CMC Senior Theses. 2715.
https://scholarship.claremont.edu/cmc_theses/2715
Included in
Data Science Commons, Econometrics Commons, Finance Commons, Longitudinal Data Analysis and Time Series Commons