Researcher ORCID Identifier

https://orcid.org/0009-0005-9594-9992

Graduation Year

2023

Date of Submission

4-2023

Document Type

Campus Only Senior Thesis

Degree Name

Bachelor of Arts

Department

Mathematical Sciences

Reader 1

Mike Izbicki

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Rights Information

2023 Olivia J Renfro

Abstract

Sentiment analysis is widely used in various industries, and both lexicon labeling and machine learning approaches have been extensively compared. While machine learning models have shown higher accuracy, they require manual labeling of training data, which is time-consuming and costly. This study compares the performance of eight hybrid sentiment analysis models on Twitter data using SentiWordNet and VADER polarity lexicons. Different transformation techniques were used to create a numerical feature map and fed into Linear SVM and Random Forest models. SVM with TF-IDF outperformed RF for most hybrid models, while RF performed better than SVM in almost all word-embedding variations. RF performed exceptionally well for both lexicons, even though it is less cited in sentiment analysis literature, and only SVM with TF-IDF transformations were competitive with RF. SVM consistently performed the worst with word-embedding transformations for both polarity lexicons.

This thesis is restricted to the Claremont Colleges current faculty, students, and staff.

Share

COinS