Graduation Year
2026
Date of Submission
12-2025
Document Type
Campus Only Senior Thesis
Degree Name
Bachelor of Arts
Department
Mathematical Sciences
Reader 1
Mark Huber
Abstract
This paper examines sentiment trends surrounding women’s college basketball athletes Caitlin Clark and Angel Reese through micro-blogging social media text during the 2023–2024 NCAA season. Two datasets of different sizes and sources were analyzed to contextualize and validate findings across platforms. The study employs both a lexicon-based approach and a machine learning predictive method. Term frequency, TF-IDF, VADER and NRC lexicons, and n-gram analysis were used to measure word level sentiment and emotional patterns. For modeling, a random forest classifier was implemented to expand beyond lexicon based insights. Results revealed clear disparities in how each athlete is discussed online: Clark consistently receives more positive sentiment, while Reese faces more polarized and negative commentary. Emotion mining further highlights differences, showing Clark is associated with joy and anticipation, whereas Reese is more frequently linked to anger and fear driven by narrative framing and media storylines. These patterns from the lexicon analysis were backed through emotion mining and modeling showing how sentiment analysis can effectively capture public perception in sports.
Recommended Citation
Chong, Renee, "Measuring Public Sentiment: A Lexicon and Machine Learning Analysis of Micro-Blogging Data Between Caitlin Clark and Angel Reese" (2026). CMC Senior Theses. 4276.
https://scholarship.claremont.edu/cmc_theses/4276
This thesis is restricted to the Claremont Colleges current faculty, students, and staff.