Similarity of Feature Subset Selection Methods on Software Metrics Data  
Author Huanjing Wang

 

Co-Author(s) Taghi M. Khoshgoftaar; Naeem Seliya

 

Abstract During the software development cycle various software metrics are collected for different reasons. An intelligent selection of software metrics prior to building defect predictors may improve model performance. A software practitioner is interested in the similarity of the feature subset selected by different metric (feature) selection algorithms. To study the similarity of different feature selection methods, we test two filter-based rankers, two filter-based subset evaluators, and two wrappers and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the similarity between techniques. Three software metric datasets from a real-world software project are used in this study. Results demonstrate that Chi-square (CS) and Signal-To-Noise (S2N) exhibit most similarity regardless of perturbation level; in addition, filter-based feature selection methods are less similar to wrappers. This demonstrates that the choice of feature selection methods will have a major influence on the features chosen, and that practitioners must be careful when making these choices to ensure their techniques will give optimal results.

 

Keywords software metrics, feature selection, similarity, defect prediction
   
    Article #:  22186
 
Proceedings of the 22nd ISSAT International Conference on Reliability and Quality in Design
August 4-6, 2016 - Los Angeles, California, U.S.A.