Performance of Filter-based Feature Subset Selection for Software Quality Data Classification

Performance of Filter-based Feature Subset Selection for Software Quality Data Classification
Author	Taghi M. Khoshgoftaar
Co-Author(s)	Huanjing Wang; Naeem Seliya
Abstract	Selecting software metrics (features or attributes) that are important for software defect prediction is a critical part of the software quality modeling process. While some works use feature ranking techniques, feature subset evaluation can remove redundant features and give smaller, more useful feature subsets (compared to feature ranking). In this study, we compare two filter-based feature subset selection techniques (correlation-based feature selection (CFS) and consistency) along with two search techniques (Best First (BF) and Greedy Stepwise (GS)) on four datasets from a real world software project. Six learners are used to build models with the selected software metrics. Each model is assessed using the area under the Receiver Operating Characteristic curve (AUC).We find that CFS-BF performed best and consistency-GS performed worst. In addition, the model built with the Logistic Regression (LR) learner performs best in terms of the AUC performance metric. This leads us to recommend the use of CFS-BF to select software metric subsets and the LR learner for building software quality classification models.
Keywords	feature subset selection, software measurements, filters, software quality classification

		Article #: 20218

Proceedings of the 20th ISSAT International Conference on Reliability and Quality in Design
August 7-9, 2014 - Seattle, Washington, U.S.A.

	International Society of Science and Applied Technologies