Date of Award
2024
Degree Type
Open Access Dissertation
Degree Name
Computational Science Joint PhD with San Diego State University, PhD
Program
Institute of Mathematical Sciences
Advisor/Supervisor/Committee Chair
Henry Yeh
Dissertation or Thesis Committee Member
Marina Chugunova
Dissertation or Thesis Committee Member
Yu Yang
Dissertation or Thesis Committee Member
Qidi Peng
Terms of Use & License Information
Rights Information
© 2024 Ibrahim M Ali
Keywords
Activity Recognition, CNN K-NN, Digital Signal Processing, Imbalanced data, Machine Learning, SMOTE ADASYN
Subject Categories
Engineering | Mathematics
Abstract
One metric used to measure classification performance in machine learning is F-beta score. The objective in this thesis is to improve the average F-b score computed in classifying shark data into shark behaviors, namely; Resting, Swimming, Feeding, and Non-Directed Motion (NDM). Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) are utilized to balance the data, from which pre-processed Fast Fourier Transform (FFT), Walsh-Hadamard Transform (WHT), and Autocorrelation (AC) features are extracted then classified using Convolutional Neural Network (CNN) and K-Nearest Neighbors (K-NN). All the combinations of the two balancing techniques, the three feature types, and the two machine learning algorithms are applied then compared to examine the average F-beta score improvement. Other signal processing techniques are also applied, to reduce the noise level of the recorded raw shark data and enhance its Signal-to-Noise Ratio (SNR).
The average F-beta scores showed that K-NN performed at its best when using FFT-only features while CNN performed at its best when using WHT-FFT features. In the K-NN case, FFT performed better when it was used alone than when it was combined with any other feature type. On the other hand, WHT performed better when it was combined with any other feature type than when it was used alone. In the CNN case, WHT and FFT performed better together than they did separately. In other words, Combining FFT and WHT features in CNN resulted in considerably improved average F-beta score, while combining them in K-NN averaged their scores. Also, whether alone or combined with other feature types, AC did not work well in CNN as it resulted in poor average F-beta scores. In K-NN, combining AC with other feature types did not improve the average F-beta score from when it is used alone.
The average F-beta scores also showed that reducing the data imbalance nature during the pre-processing phase is more effective than mitigating the misleading classification during the machine learning phase. Prior balancing was performed using SMOTE and ADASYN, while later mitigation was performed using weight-sensitive learning. SMOTE, more so ADASYN, reduced the difference between precision and recall scores, and produced higher F-beta scores.
Besides the mentioned two balancing techniques, the three feature types, and the two machine learning algorithms, other pre-processing techniques that were applied to the raw data contributed to the improvement of the average F-beta score. These pre-processing techniques included framing, detrending, normalization, Ensemble Average (EA) based low-pass filtering, filter delay compensation, overlap windowing, and k-fold cross validation. For example, the average F-beta scores showed that applying EA-based low-pass filters (LPF) on the data, prior to machine learning and classification, improves Signal Power to Noise Power Ratio (SNR), and sequentially improves average F-beat scores significantly.
As an end result, for the shark data used in this thesis, CNN was found to be a better choice than K-NN, and it was a better choice when using WHT-FFT as features and ADASYN as balancing technique.
ISBN
9798382748450
Recommended Citation
Ali, Ibrahim M.. (2024). Improveing F-beta Score in Classifying Shark Data into Shark Behaviors. CGU Theses & Dissertations, 814. https://scholarship.claremont.edu/cgu_etd/814.