Graduation Year

2025

Document Type

Campus Only Senior Thesis

Degree Name

Bachelor of Science

Department

Computer Science

Reader 1

Mark Huber

Reader 2

Winston Ou

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Abstract

Tennis analytics traditionally has focused on descriptive statistics or match-level outcomes, with point-level prediction presenting a more challenging task due to uncertainty, psychological variability, and limited sensor data. So, using about two-million Grand Slam points from 2011–2022, I developed and evaluated machine learning models that predict point-level outcomes based on pre-point information. A baseline gradient-boosting model using all available features, including outcome-adjacent variables, achieved very high performance (94.12\% accuracy, AUC=.99). Because these features encode match dominance, I trained a second model using only pre-point features representing score context, game state, and engineered momentum metrics. This model achieved strong performance (87.40\% accuracy, AUC=.96) using only information available before the point begins. This proves substantial predictability within structurally constrained match states.

To extend prior work, this paper introduces a pressure-weighted momentum index which combines short-term performance with break-point success, to model performance volatility under stress. This feature improved predictive performance and ranked in the top 58\% of predictors. These results indicate that tennis is not random at the point level; outcomes are shaped by scoring leverage, serve status, and short-term competitive momentum. These findings support the feasibility of real-time outcome forecasting in elite tennis and highlight the value of psychology-informed context features in sports prediction models.

This thesis is restricted to the Claremont Colleges current faculty, students, and staff.

Share

COinS