Graduation Year
Spring 2014
Document Type
Open Access Senior Thesis
Degree Name
Bachelor of Arts
Department
Mathematics
Reader 1
Deanna Needell
Terms of Use & License Information
Rights Information
© 2014 Walker Evan Casey
Abstract
Collaborative filtering based recommender systems use information about a user's preferences to make personalized predictions about content, such as topics, people, or products, that they might find relevant. As the volume of accessible information and active users on the Internet continues to grow, it becomes increasingly difficult to compute recommendations quickly and accurately over a large dataset. In this study, we will introduce an algorithmic framework built on top of Apache Spark for parallel computation of the neighborhood-based collaborative filtering problem, which allows the algorithm to scale linearly with a growing number of users. We also investigate several different variants of this technique including user and item-based recommendation approaches, correlation and vector-based similarity calculations, and selective down-sampling of user interactions. Finally, we provide an experimental comparison of these techniques on the MovieLens dataset consisting of 10 million movie ratings.
Recommended Citation
Casey, Walker Evan, "Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark" (2014). CMC Senior Theses. 873.
https://scholarship.claremont.edu/cmc_theses/873
Included in
Artificial Intelligence and Robotics Commons, Software Engineering Commons, Statistical Models Commons