Analyzing Generated Labels for Cognition Datasets  
Author Robert K.L. Kennedy

 

Co-Author(s) Taghi M. Khoshgoftaar

 

Abstract We use our novel unsupervised class label generating methodology in a new way to generate class labels for cognition datasets. We explore the quality and efficacy of class labels used for detecting cognitive issues across various levels of class imbalance, ranging from highly imbalanced to balanced. The cognition datasets are originally from publicly available survey data as part of the Health and Retirement Study (HRS). To measure label quality, we train a supervised classifier on the newly generated labels and compare its classification performance with a widely used unsupervised anomaly detection method. Our empirical results and analysis show that the new labels are of high enough quality to produce a classifier that outperforms the baseline across all tested scenarios when using the area under the precision-recall curve (AUPRC). Additionally, we show that as the dataset becomes more balanced, the AUPRC performance increases.

 

Keywords generated labels, unsupervised learning, cognition, machine learning
   
    Article #:  RQD2024-164
 

Proceedings of 29th ISSAT International Conference on Reliability & Quality in Design
August 8-10, 2024