Date of Award
Fall 2020
Degree Type
Open Access Dissertation
Degree Name
Computational Science Joint PhD with San Diego State University, PhD
Program
Institute of Mathematical Sciences
Advisor/Supervisor/Committee Chair
Robert Edwards
Dissertation or Thesis Committee Member
Claudia Rangel
Dissertation or Thesis Committee Member
Anca Segall
Dissertation or Thesis Committee Member
Allon Percus
Terms of Use & License Information
Rights Information
© Vito Adrian Cantu Alessio Robles, 2020 All rights reserved
Subject Categories
Artificial Intelligence and Robotics | Bioinformatics | Genetics
Abstract
As of October 2020, there are 18.6 × 1015 DNA base pairs publicly available in the Sequence Read Archive and this number is growing at an exponential rate. As DNA sequencing prices continue to drop, many research groups around the world have incorporated high throughput sequencing in their research, giving us access to sequences from many distinct ecosystems. This has revolutionized the field of metagenomics, which aims to fully characterize all organisms and their interactions in a particular system. Nevertheless, the plethora of available data has made its analysis difficult as traditional techniques such as genome assembly or sequence alignment are bound to fail due to the high noise of metagenomes, or take an impractically long time due to their size. Through this thesis, we explore those challenges and develop techniques to meet them. Chapter 1 serves as an introduction to the fields of metagenomics and machine learning and the applications where the two meet. Chapter 2 examines the different kinds of noises in sequencing datasets and presents PRINSEQ++, a C++ multi-threaded software for quality control of sequencing datasets. Chapter 3 describes the analysis of 63 metagenomic samples from children with ”nodding syndrome” using Random Forest to give insights into the etiology of the disease. Chapter 4 explores the use of artificial neutral networks to classify phage structural proteins derived from metagenomes.
ISBN
9798557058360
Recommended Citation
Cantu Alessio Robles, Vito Adrian. (2020). Machine Learning Methods for the Analysis of Metagenomes. CGU Theses & Dissertations, 276. https://scholarship.claremont.edu/cgu_etd/276.