International Society of Science and Applied Technologies |
|
Performance of Feature Subset Evaluators for Software Engineering Datasets | ||||
Author | Huanjing Wang
|
|||
Co-Author(s) | Taghi M. Khoshgoftaar; Kehan Gao
|
|||
Abstract | The objective of feature selection is to identify irrelevant or redundant features, which can then be discarded from the analysis. Reducing the number of metrics (features) in a software dataset can lead to faster defect prediction model training and improve classifier performance. In the context of software defect prediction, we investigated two filter-based and five wrapper-based feature (software metrics) subset evaluators and built classification models using five different classifiers. The models were evaluated using the area under the Receiver Operating Characteristic (ROC) Curve (AUC). All experiments were conducted on nine imbalanced datasets from a real-world software project. The experimental results demonstrated that the choice of subset evaluators may significantly influence the classification evaluation conclusion. In this study, we have found that Correlation-Based Feature Selection performed best followed by k-nearest neighbors wrapper evaluator. The model built with support vector machine performed best.
|
|||
Keywords | feature subset selection, software measurements, filters, wrappers, software quality classification | |||
Article #: 23-131 |
August 3-5, 2017 - Chicago, Illinois, U.S.A. |