Graduation Year

Fall 2013

Document Type

Open Access Senior Thesis

Degree Name

Bachelor of Arts

Reader 1

Sara Sood

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Rights Information

© 2013 Sumaiya F. Hashmi


I will investigate applications of machine learning algorithms to medical data, adaptations of differences in data collection, and the use of ensemble techniques.

Focusing on the binary classification problem of Parkinson’s Disease (PD) diagnosis, I will apply machine learning algorithms to a primary dataset consisting of voice recordings from healthy and PD subjects. Specifically, I will use Artificial Neural Networks, Support Vector Machines, and an Ensemble Learning algorithm to reproduce results from [MS12] and [GM09].

Next, I will adapt a secondary regression dataset of PD recordings and combine it with the primary binary classification dataset, testing various techniques to consolidate the data including treating the regression data as unlabeled data in a semi-supervised learning approach. I will determine the performance of the above algorithms on this consolidated dataset.

Performance of algorithms will be evaluated using 10-fold cross validation and results will be analyzed in a confusion matrix. Accuracy, precision, recall, and F-score will be calculated.

The expands on past related work, which has used either a regression dataset alone to predict a Unified Parkinson’s Disease Rating Scale score for PD patients, or a classification dataset to determine healthy or PD diagnosis. In past work, the datasets have not been combined, and the regression set has not been used to contribute to evaluation of healthy subjects.


This thesis has been submitted as part of the senior exercise for the degree of Bachelor of Arts in Computer Science at Pomona College as an off-campus major.