A Comparative Study of Sampled Feature Ranker Ensembles for Software Quality Classification | ||||
Author | Taghi M. Khoshgoftaar
|
|||
Co-Author(s) | Kehan Gao; Lofton A. Bullard
|
|||
Abstract | This paper presents the repetitive feature selection (FS) method to address the high dimensionality and class imbalance problems that often appear in software measurement data. The repetitive method is an iterative process of data sampling followed by feature ranking which finally aggregates the results generated during the iterative process. In this work, we are interested in studying the effect of two components (the ranking technique and sampling method) in the repetitive FS process on classification performance. We investigate seven filters (ranking techniques) and an ensemble filter, and three data sampling methods, each combined with two different post-sampling class ratios between the two classes. Therefore, we examine a total of 48 different repetitive FS techniques. The empirical study is carried out on two groups of highly imbalanced software data sets. The results demonstrate that some filters present more stable and better classification performance than other filters with respect to various sampling approaches in the repetitive FS technique.
|
|||
Keywords | ||||
Article #: 1841 |
July 26-28, 2012 - Boston, Massachusetts, U.S.A. |