Selecting Features for Detecting Credit Card Fraud Using SHAP Values  
Author Huanjing Wang

 

Co-Author(s) Taghi M. Khoshgoftaar; Qianxin Liang

 

Abstract Credit card fraud detection is essential not only for protecting customers and financial institutions but also for maintaining the trust and reliability of the entire financial system. Machine learning techniques play a central role in credit card fraud detection, offering powerful tools to identify fraudulent transactions accurately and efficiently. This study employs SHAP (SHapley Additive exPlanations)- value-based feature selection technique. Top features are selected based on the SHAP values. Various classification models are investigated, including Decision Tree, Random Forests, XGBoost, and Logistic Regression. Evaluation is done using the Area under the Precision-Recall Curve (AUPRC) metric. All experiments are conducted with the Kaggle Credit Card Fraud Detection Dataset. In our investigation, Decision Tree is employed as the learner in the SHAP-value-based feature selection process. The fraud detection models created using Random Forest and XGBoost excel beyond the performance of Decision Tree, while Decision Tree itself outperforms Logistic Regression. Our findings indicate that the classifier utilized in the model construction phase does not necessarily have to match the learners used in the feature selection stage.

 

Keywords SHAP, Feature Selection, Credit Card Fraud Detection, Machine Learning
   
    Article #:  RQD2024-144
 

Proceedings of 29th ISSAT International Conference on Reliability & Quality in Design
August 8-10, 2024