Date of Award

2024

Degree Type

Open Access Dissertation

Degree Name

Computational Science Joint PhD with San Diego State University, PhD

Program

Institute of Mathematical Sciences

Advisor/Supervisor/Committee Chair

Henry Yeh

Dissertation or Thesis Committee Member

Marina Chugunova

Dissertation or Thesis Committee Member

Yu Yang

Dissertation or Thesis Committee Member

Qidi Peng

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Rights Information

© 2024 Ibrahim M Ali

Keywords

Activity Recognition, CNN K-NN, Digital Signal Processing, Imbalanced data, Machine Learning, SMOTE ADASYN

Subject Categories

Engineering | Mathematics

Abstract

One metric used to measure classification performance in machine learning is F-beta score. The objective in this thesis is to improve the average F-b score computed in classifying shark data into shark behaviors, namely; Resting, Swimming, Feeding, and Non-Directed Motion (NDM). Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) are utilized to balance the data, from which pre-processed Fast Fourier Transform (FFT), Walsh-Hadamard Transform (WHT), and Autocorrelation (AC) features are extracted then classified using Convolutional Neural Network (CNN) and K-Nearest Neighbors (K-NN). All the combinations of the two balancing techniques, the three feature types, and the two machine learning algorithms are applied then compared to examine the average F-beta score improvement. Other signal processing techniques are also applied, to reduce the noise level of the recorded raw shark data and enhance its Signal-to-Noise Ratio (SNR).

The average F-beta scores showed that K-NN performed at its best when using FFT-only features while CNN performed at its best when using WHT-FFT features. In the K-NN case, FFT performed better when it was used alone than when it was combined with any other feature type. On the other hand, WHT performed better when it was combined with any other feature type than when it was used alone. In the CNN case, WHT and FFT performed better together than they did separately. In other words, Combining FFT and WHT features in CNN resulted in considerably improved average F-beta score, while combining them in K-NN averaged their scores. Also, whether alone or combined with other feature types, AC did not work well in CNN as it resulted in poor average F-beta scores. In K-NN, combining AC with other feature types did not improve the average F-beta score from when it is used alone.

The average F-beta scores also showed that reducing the data imbalance nature during the pre-processing phase is more effective than mitigating the misleading classification during the machine learning phase. Prior balancing was performed using SMOTE and ADASYN, while later mitigation was performed using weight-sensitive learning. SMOTE, more so ADASYN, reduced the difference between precision and recall scores, and produced higher F-beta scores.

Besides the mentioned two balancing techniques, the three feature types, and the two machine learning algorithms, other pre-processing techniques that were applied to the raw data contributed to the improvement of the average F-beta score. These pre-processing techniques included framing, detrending, normalization, Ensemble Average (EA) based low-pass filtering, filter delay compensation, overlap windowing, and k-fold cross validation. For example, the average F-beta scores showed that applying EA-based low-pass filters (LPF) on the data, prior to machine learning and classification, improves Signal Power to Noise Power Ratio (SNR), and sequentially improves average F-beat scores significantly.

As an end result, for the shark data used in this thesis, CNN was found to be a better choice than K-NN, and it was a better choice when using WHT-FFT as features and ADASYN as balancing technique.

ISBN

9798382748450

Share

COinS