Graduation Year

Spring 2014

Document Type

Open Access Senior Thesis

Degree Name

Bachelor of Arts



Reader 1

Deanna Needell

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Rights Information

© 2014 Walker Evan Casey


Collaborative filtering based recommender systems use information about a user's preferences to make personalized predictions about content, such as topics, people, or products, that they might find relevant. As the volume of accessible information and active users on the Internet continues to grow, it becomes increasingly difficult to compute recommendations quickly and accurately over a large dataset. In this study, we will introduce an algorithmic framework built on top of Apache Spark for parallel computation of the neighborhood-based collaborative filtering problem, which allows the algorithm to scale linearly with a growing number of users. We also investigate several different variants of this technique including user and item-based recommendation approaches, correlation and vector-based similarity calculations, and selective down-sampling of user interactions. Finally, we provide an experimental comparison of these techniques on the MovieLens dataset consisting of 10 million movie ratings. (180999 kB)