Performance of Filter-based Feature Subset Selection for Software Quality Data Classification  
Author Taghi M. Khoshgoftaar

 

Co-Author(s) Huanjing Wang; Naeem Seliya

 

Abstract Selecting software metrics (features or attributes) that are important for software defect prediction is a critical part of the software quality modeling process. While some works use feature ranking techniques, feature subset evaluation can remove redundant features and give smaller, more useful feature subsets (compared to feature ranking). In this study, we compare two filter-based feature subset selection techniques (correlation-based feature selection (CFS) and consistency) along with two search techniques (Best First (BF) and Greedy Stepwise (GS)) on four datasets from a real world software project. Six learners are used to build models with the selected software metrics. Each model is assessed using the area under the Receiver Operating Characteristic curve (AUC).We find that CFS-BF performed best and consistency-GS performed worst. In addition, the model built with the Logistic Regression (LR) learner performs best in terms of the AUC performance metric. This leads us to recommend the use of CFS-BF to select software metric subsets and the LR learner for building software quality classification models.

 

Keywords feature subset selection, software measurements, filters, software quality classification
   
    Article #:  20218
 
Proceedings of the 20th ISSAT International Conference on Reliability and Quality in Design
August 7-9, 2014 - Seattle, Washington, U.S.A.