Researcher ORCID Identifier
0009-0005-9195-3839
Graduation Year
2026
Date of Submission
4-2026
Document Type
Open Access Senior Thesis
Degree Name
Bachelor of Arts
Department
Mathematical Sciences
Second Department
Biology
Reader 1
Shibu Yooseph
Terms of Use & License Information
Rights Information
@ 2026 Matthew Q Jabro
Abstract
Soil harbors the most diverse microbial communities on Earth, yet whether predictable community types exist across biomes and whether taxonomic composition encodes habitat of origin remain open questions at global scale. This thesis addresses both questions by applying unsupervised clustering and supervised classification to transformed 16S ribosomal RNA (rRNA) amplicon profiles from two independent datasets: the global topsoil survey of Bahram et al. (193 samples) and the Earth Microbiome Project (EMP) soil subset of Thompson et al. (2,209 samples). Application of a sample clustering method based on a mixture of Gaussian Graphical Models (MixGGM) identified 19 clusters in the topsoil dataset and 10 in the EMP dataset; cluster assignments were significantly associated with biome type, geographic location, and physicochemical variables in both cases. Random forest classifiers trained on the same sample-taxa matrices achieved 61.8% test-set accuracy under grouped biome labels for the topsoil data and 98.2% for the EMP data. The performance gap likely reflects the underlying biology: major habitat boundaries (marine vs. terrestrial vs. freshwater) produce strong compositional contrasts, while within-terrestrial biome differences are subtler and driven by overlapping environmental gradients rather than discrete community turnover. The most predictive taxa, including Ellin516, Candidatus Udaeobacter, Anaeromyxobacter, Bacillus, and Paenibacillus, have known ecological associations consistent with the biome categories they discriminated. Taken together, these results demonstrate that genus-level 16S rRNA profiles carry sufficient information to recover recurring community types and predict habitat of origin, with predictive power scaling with the breadth of environmental contrast in the dataset.
Recommended Citation
Jabro, Matthew, "Microbial Community Structure in Global Soils" (2026). CMC Senior Theses. 4211.
https://scholarship.claremont.edu/cmc_theses/4211
Data Repository Link
https://github.com/Mattjabro/Thesis
Included in
Data Science Commons, Environmental Microbiology and Microbial Ecology Commons, Genomics Commons