Date of Award

Fall 2024

Degree Type

Open Access Dissertation

Degree Name

Mathematics, PhD

Program

Institute of Mathematical Sciences

Advisor/Supervisor/Committee Chair

Marina Chugunova

Dissertation or Thesis Committee Member

Ali Nadim

Dissertation or Thesis Committee Member

Qidi Peng

Terms of Use & License Information

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Rights Information

© 2024 An Ly

Keywords

Max kCut Optimization, Text Classification

Subject Categories

Mathematics

Abstract

I introduce a novel recursive modification to the classical Goemans-Williamson MaxCut algorithm, offering improved performance in vectorized data clustering tasks. Focusing on the clustering of medical publications, I suggest to employ recursive iterations in conjunction with a dimension relaxation method to enhance density of clustering results. Furthermore, I propose a new vectorization technique for articles, leveraging conditional probabilities for more effective clustering. I believe that these methods will provide advantages in both computational efficiency and clustering accuracy. I will analyze the effectiveness of recursive iterations and higher-dimensional generalizations of the GWA in the hopes of achieving more accurate dissimilarity-based clustering. I think these methods combined with dimensionality reduction have the potential to further enhance clustering results. In addition, the introduction of the vectorization method based on conditional probabilities will provide an additional tool for unsupervised document classification. While GWA shows promise in accurately clustering articles, there are some challenges that will need to be researched and refined on other collected or computer-generated datasets before being applied. Future development of techniques to handle outliers and to fine-tune the parameters will contribute to a more precise and robust method.

ISBN

9798346863373

Included in

Mathematics Commons

Share

COinS