Analyzing Generated Labels for Cognition Datasets

Analyzing Generated Labels for Cognition Datasets
Author	Robert K.L. Kennedy
Co-Author(s)	Taghi M. Khoshgoftaar
Abstract	We use our novel unsupervised class label generating methodology in a new way to generate class labels for cognition datasets. We explore the quality and efficacy of class labels used for detecting cognitive issues across various levels of class imbalance, ranging from highly imbalanced to balanced. The cognition datasets are originally from publicly available survey data as part of the Health and Retirement Study (HRS). To measure label quality, we train a supervised classifier on the newly generated labels and compare its classification performance with a widely used unsupervised anomaly detection method. Our empirical results and analysis show that the new labels are of high enough quality to produce a classifier that outperforms the baseline across all tested scenarios when using the area under the precision-recall curve (AUPRC). Additionally, we show that as the dataset becomes more balanced, the AUPRC performance increases.
Keywords	generated labels, unsupervised learning, cognition, machine learning

		Article #: RQD2024-164

Proceedings of 29th ISSAT International Conference on Reliability & Quality in Design
August 8-10, 2024

	International Society of Science and Applied Technologies