Researcher ORCID Identifier

https://orcid.org/0009-0004-5367-1745

Graduation Year

2026

Date of Submission

12-2025

Document Type

Campus Only Senior Thesis

Degree Name

Bachelor of Arts

Department

Economics

Reader 1

Michael Gelman

Abstract

Machine learning models are increasingly used in high-stakes employment decisions, including salary prediction and compensation benchmarking. When these models learn from historically biased labor market data, they risk perpetuating discrimination across demographic groups. Despite growing concern about algorithmic fairness, limited empirical work examines how the choice of encoding method and machine learning model jointly affects both predictive accuracy and fairness in salary prediction.

I compare ten model-encoding combinations using salary data from Kaggle spanning multiple countries (N = 6,699). Three encoding strategies—One-Hot, Target Mean, and CatBoost encoding—are paired with four model types: Linear Regression, Random Forest, XGBoost, and CatBoost Regressor. Predictive accuracy is measured using root mean squared error (RMSE) across five-fold cross-validation. Fairness is assessed through normalized prediction residuals on a held-out test set, examining whether models systematically over- or underpredict salaries for women and racial minorities. Two parallel experiments test whether removing protected attributes ("fairness through unawareness") reduces bias.

I find that One-Hot encoding paired with XGBoost achieves both the highest predictive accuracy and the least bias across gender and racial groups—challenging the assumption that accuracy and fairness are necessarily in tension. All models systematically overpredicted women's salaries, consistent with omitted variable bias. Removing gender and race from the models did not reduce this bias; for One-Hot + XGBoost, female overprediction nearly tripled, and all models showed statistically significant gender gaps when protected attributes were removed compared to only two when included. Despite balanced racial representation, 4 of 17 country-race groups showed significant residual disparities—all minority groups—and removing protected attributes preserved the direction of racial bias in 88% of groups while introducing new biases in others. Target encoding paired with XGBoost produced the largest fairness gaps, while One-Hot encoding consistently minimized bias.

These findings demonstrate that encoding choice matters more than model complexity for fairness outcomes, and that attempts to achieve fairness by excluding protected attributes are not only ineffective but counterproductive.

This thesis is restricted to the Claremont Colleges current faculty, students, and staff.

Share

COinS