A Framework For Selecting a Subset of Metrics Considering Cost  
Author Vidhyashree Nagaraju

 

Co-Author(s) Donghui Yan; Lance Fiondella

 

Abstract Data collection procedures can record many features of a system or application. However, the data may possess redundant or irrelevant features. Moreover, cost to collect individual data features may vary with some incur substantial cost. Beside increased computational complexity, using all available data may lead to unnecessary costs, especially when data collection with some features may be expensive. For example, measurements in some major airline maintenance projects, medical screening and diagnosis, and data features purchased from a third-party vendors. Moreover, redundant features may negatively impact the accuracy of the model. Previous studies attempt to address these challenges by a feature subset selection formulation. These studies select features but are not cost-sensitive, or they assume equal cost for each feature. Clearly, cost is crucial in many applications, and should be an important dimension when selecting features. This paper proposes a framework to select a subset of features where each variable can possess a different cost. The L1 regularized model is employed because of its popularity over alternative methods for variable subset selection and regularization paths of the L1 model can impose cost constraints. The proposed approach is demonstrated through a data set collected to measure system performance degradation and guide maintenance decisions. The results indicate that the proposed L1 regularized model achieves ‘optimal’ accuracy while simultaneously enforcing a cost constraint. Thus, the proposed approach can inform practical decision making by considering the tradeoff between accuracy and cost of collecting system measurements.

 

Keywords Logistic regression, regularization, feature selection, classification, cost-sensitive
   
    Article #:  24250
 
Proceedings ISSAT International Conference on Reliability and Quality in Design 2018
August 2-4, 2018 - Toronto, Ontario, Canada