Date of Award
Summer 2024
Degree Type
Open Access Dissertation
Degree Name
Mathematics, PhD
Program
Institute of Mathematical Sciences
Advisor/Supervisor/Committee Chair
Mike Izbicki
Dissertation or Thesis Committee Member
John Angus
Dissertation or Thesis Committee Member
Qidi Peng
Dissertation or Thesis Committee Member
Yu Bai
Terms of Use & License Information
Rights Information
© 2024 Yujie Wang
Subject Categories
Mathematics
Abstract
The loss function plays a critical role in machine learning. It is fundamental in training, evaluating, and optimizing machine learning models, directly impacting their effectiveness and efficiency in solving specific tasks. We explore three new loss functions and their applications. Softmax Cross-Entropy Loss, stands as a prevalent choice in neural network classification tasks. It treats all misclassifications uniformly. However, multi-class classification problems often have many semantically similar classes. We should expect that these semantically similar classes will have similar parameter vectors. We introduce a weighted loss function, the tree loss as a drop-in replacement for the cross entropy loss. The tree loss re-parameterizes the parameter matrix in order to guarantee that semantically similar classes will have similar parameter vectors. Using simple properties of stochastic gradient descent, we show that the the tree loss’s generalization error is asymptotically better than the cross entropy loss’s. We then validate these theoretical results on synthetic data, image data (CIFAR100, ImageNet), and text data (Twitter).We also investigate the application of contrastive loss in large document embeddings. Existing model pretraining methods primarily focus on local information. For instance, in the widely used token masking strategy, words closer to the masked token are given more importance for prediction than words further away. While this approach results in pretrained models that generate high-quality sentence embeddings, it leads to low-quality embeddings for larger documents. We propose a new pretraining method called DocSplit, which compels models to consider the entire global context of a large document. Our method employs a contrastive loss where the positive examples are randomly sampled sections of the input document, and the negative examples are randomly sampled sections from unrelated documents. Similar to previous pretraining methods, DocSplit is fully unsupervised, straightforward to implement, and can be used to pretrain any model architecture. Our experiments demonstrate that DocSplit outperforms other pretraining methods in tasks such as document classification, few-shot learning, and document retrieval. By forcing the model to incorporate the global context, DocSplit significantly enhances the quality of document embeddings and overall performance across various applications. We introduce weighted loss function extends the traditional Contrastive Loss by incorporating inter-label relationships, enabling the model to better capture the intricate distinctions between closely related classes. By leveraging information about label relationships, weighted contrastive loss enhances the discriminative power of the learned representations, thereby improving the model’s ability to handle fine-grained classification tasks effectively. We analyze the contrastive fine-tuning of pre-trained language models on two fine-grained text classification tasks: emotion classification and sentiment analysis. We find that weighted contrastive Loss function outperforms previous contrastive methods, particularly when dealing with a larger number of classes or more confusable classes.
ISBN
9798383703984
Recommended Citation
Wang, Yujie. (2024). Exploring Loss Functions in Machine Learning. CGU Theses & Dissertations, 831. https://scholarship.claremont.edu/cgu_etd/831.