International Society of Science and Applied Technologies |
|
Analyzing Generated Labels for Cognition Datasets | ||||
Author | Robert K.L. Kennedy
|
|||
Co-Author(s) | Taghi M. Khoshgoftaar
|
|||
Abstract | We use our novel unsupervised class label generating methodology in a new way to generate class labels for cognition datasets. We explore the quality and efficacy of class labels used for detecting cognitive issues across various levels of class imbalance, ranging from highly imbalanced to balanced. The cognition datasets are originally from publicly available survey data as part of the Health and Retirement Study (HRS). To measure label quality, we train a supervised classifier on the newly generated labels and compare its classification performance with a widely used unsupervised anomaly detection method. Our empirical results and analysis show that the new labels are of high enough quality to produce a classifier that outperforms the baseline across all tested scenarios when using the area under the precision-recall curve (AUPRC). Additionally, we show that as the dataset becomes more balanced, the AUPRC performance increases.
|
|||
Keywords | generated labels, unsupervised learning, cognition, machine learning | |||
Article #: RQD2024-164 |
Proceedings of 29th ISSAT International Conference on Reliability & Quality in Design |