Date of Award

Summer 2024

Degree Type

Open Access Dissertation

Degree Name

Computational Science Joint PhD with San Diego State University, PhD

Program

Institute of Mathematical Sciences

Advisor/Supervisor/Committee Chair

Antoni Luque

Dissertation or Thesis Committee Member

Anca Segall

Dissertation or Thesis Committee Member

Manal Swairjo

Dissertation or Thesis Committee Member

Allon Percus & Marina Chugunova

Terms of Use & License Information

Creative Commons Attribution-Share Alike 4.0 License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.

Rights Information

© 2024 Diana Y Lee

Keywords

bacteriophages, computational biology, machine learning, structural biology, viral architecture, viral biomathematics

Subject Categories

Biology | Computer Sciences

Abstract

Bacteriophages are the most ubiquitous biological entity on the planet, but most viruses found in nature cannot be cultured in the laboratory and encode genes whose sequences lack similarity with current nucleotide and protein databases. New predictive methods are thus necessary to determine the phenotype of viruses. In this work, we leverage the physical geometrical constraints of viruses to quantify the correlation between the geometric and genomic characteristics of tailed phages, and predict physical features such as architecture and genome length of uncultured viruses using allometric models and machine learning algorithms. Here, we present a model to predict the T-number as a measure of capsid architecture of tailed phages based on the genome length of the virus, and another to predict viral genome length from their structural genes such as the Major Capsid Protein (MCP). Our research indicates that the genome length can predict capsid architecture with 90% accuracy, and that MCP features can predict capsid architecture with an overall 84% accuracy. Using the same MCP features in a multi-step predictive model predict the genome length of a virus with an overall average mean relative error of 7.6%. Since this model is based on a single gene, improvement may be achieved with the addition of other structural genes, such as the portal or scaffolding genes, or by refining our methods of isolating the MCP. This approach can help predict the phenotype of uncultured viruses and fill the gap in our understanding of the virosphere.

ISBN

9798302179050

Share

COinS