Date of Award
Summer 2024
Degree Type
Open Access Dissertation
Degree Name
Computational Science Joint PhD with San Diego State University, PhD
Program
Institute of Mathematical Sciences
Advisor/Supervisor/Committee Chair
Antoni Luque
Dissertation or Thesis Committee Member
Anca Segall
Dissertation or Thesis Committee Member
Manal Swairjo
Dissertation or Thesis Committee Member
Allon Percus & Marina Chugunova
Terms of Use & License Information
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.
Rights Information
© 2024 Diana Y Lee
Keywords
bacteriophages, computational biology, machine learning, structural biology, viral architecture, viral biomathematics
Subject Categories
Biology | Computer Sciences
Abstract
Bacteriophages are the most ubiquitous biological entity on the planet, but most viruses found in nature cannot be cultured in the laboratory and encode genes whose sequences lack similarity with current nucleotide and protein databases. New predictive methods are thus necessary to determine the phenotype of viruses. In this work, we leverage the physical geometrical constraints of viruses to quantify the correlation between the geometric and genomic characteristics of tailed phages, and predict physical features such as architecture and genome length of uncultured viruses using allometric models and machine learning algorithms. Here, we present a model to predict the T-number as a measure of capsid architecture of tailed phages based on the genome length of the virus, and another to predict viral genome length from their structural genes such as the Major Capsid Protein (MCP). Our research indicates that the genome length can predict capsid architecture with 90% accuracy, and that MCP features can predict capsid architecture with an overall 84% accuracy. Using the same MCP features in a multi-step predictive model predict the genome length of a virus with an overall average mean relative error of 7.6%. Since this model is based on a single gene, improvement may be achieved with the addition of other structural genes, such as the portal or scaffolding genes, or by refining our methods of isolating the MCP. This approach can help predict the phenotype of uncultured viruses and fill the gap in our understanding of the virosphere.
ISBN
9798302179050
Recommended Citation
Lee, Diana Yvette. (2024). Reaching Across the Divide: Tools for Bridging Structural and Viral Genomics Using a Combination of Biophysical Principles and Machine Learning. CGU Theses & Dissertations, 906. https://scholarship.claremont.edu/cgu_etd/906.