A Comparative Study of Sampled Feature Ranker Ensembles for Software Quality Classification  
Author Taghi M. Khoshgoftaar

 

Co-Author(s) Kehan Gao; Lofton A. Bullard

 

Abstract This paper presents the repetitive feature selection (FS) method to address the high dimensionality and class imbalance problems that often appear in software measurement data. The repetitive method is an iterative process of data sampling followed by feature ranking which finally aggregates the results generated during the iterative process. In this work, we are interested in studying the effect of two components (the ranking technique and sampling method) in the repetitive FS process on classification performance. We investigate seven filters (ranking techniques) and an ensemble filter, and three data sampling methods, each combined with two different post-sampling class ratios between the two classes. Therefore, we examine a total of 48 different repetitive FS techniques. The empirical study is carried out on two groups of highly imbalanced software data sets. The results demonstrate that some filters present more stable and better classification performance than other filters with respect to various sampling approaches in the repetitive FS technique.

 

Keywords
   
    Article #:  1841
 
Proceedings of the 18th ISSAT International Conference on Reliability and Quality in Design
July 26-28, 2012 - Boston, Massachusetts, U.S.A.