Date of Award

Summer 2024

Degree Type

Open Access Dissertation

Degree Name

Mathematics, PhD

Program

Institute of Mathematical Sciences

Advisor/Supervisor/Committee Chair

Mike Izbicki

Dissertation or Thesis Committee Member

John Angus

Dissertation or Thesis Committee Member

Qidi Peng

Dissertation or Thesis Committee Member

Yu Bai

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Rights Information

© 2024 Yujie Wang

Subject Categories

Mathematics

Abstract

The loss function plays a critical role in machine learning. It is fundamental in training, evaluating, and optimizing machine learning models, directly impacting their effectiveness and efficiency in solving specific tasks. We explore three new loss functions and their applications. Softmax Cross-Entropy Loss, stands as a prevalent choice in neural network classification tasks. It treats all misclassifications uniformly. However, multi-class classification problems often have many semantically similar classes. We should expect that these semantically similar classes will have similar parameter vectors. We introduce a weighted loss function, the tree loss as a drop-in replacement for the cross entropy loss. The tree loss re-parameterizes the parameter matrix in order to guarantee that semantically similar classes will have similar parameter vectors. Using simple properties of stochastic gradient descent, we show that the the tree loss’s generalization error is asymptotically better than the cross entropy loss’s. We then validate these theoretical results on synthetic data, image data (CIFAR100, ImageNet), and text data (Twitter).We also investigate the application of contrastive loss in large document embeddings. Existing model pretraining methods primarily focus on local information. For instance, in the widely used token masking strategy, words closer to the masked token are given more importance for prediction than words further away. While this approach results in pretrained models that generate high-quality sentence embeddings, it leads to low-quality embeddings for larger documents. We propose a new pretraining method called DocSplit, which compels models to consider the entire global context of a large document. Our method employs a contrastive loss where the positive examples are randomly sampled sections of the input document, and the negative examples are randomly sampled sections from unrelated documents. Similar to previous pretraining methods, DocSplit is fully unsupervised, straightforward to implement, and can be used to pretrain any model architecture. Our experiments demonstrate that DocSplit outperforms other pretraining methods in tasks such as document classification, few-shot learning, and document retrieval. By forcing the model to incorporate the global context, DocSplit significantly enhances the quality of document embeddings and overall performance across various applications. We introduce weighted loss function extends the traditional Contrastive Loss by incorporating inter-label relationships, enabling the model to better capture the intricate distinctions between closely related classes. By leveraging information about label relationships, weighted contrastive loss enhances the discriminative power of the learned representations, thereby improving the model’s ability to handle fine-grained classification tasks effectively. We analyze the contrastive fine-tuning of pre-trained language models on two fine-grained text classification tasks: emotion classification and sentiment analysis. We find that weighted contrastive Loss function outperforms previous contrastive methods, particularly when dealing with a larger number of classes or more confusable classes.

ISBN

9798383703984

Included in

Mathematics Commons

Share

COinS