Machine Learning for Compositional Data Analysis in Support of the Decision Making Process  
Author Thi Thuy Van Nguyen


Co-Author(s) Cédric Heuchenne; Kim Phuc Tran


Abstract Due to the importance of ML in data analysis and its limited research on CoDa, in this work, we will summarize the most popular ML techniques on CoDa, including principal component analysis (PCA), clustering, classification, and regression. Besides, we will introduce an efficient transformation method based on Dirichlet density estimation to transform CoDa into real data. The proposed method can not only remove the constraint (nonnegative and constant-sum) on each CoDa vector, but also reduce its dimension and improve the quality of data. We also apply the transformed data deriving from this method in anomaly detection using Support Vector Data Description (SVDD), a one-class classification algorithm that allows us to detect abnormal observations by modeling the normal ones. To indicate the promise of this method in building classification models as well as anomaly detection models on CoDa, a simulation example will also be provided at the end of the work.


Keywords Compositional Data, Machine learning, Anomaly Detection, SVDD, Dirichlet density
    Article #:  DSBFI23-19
Proceedings of 2nd ISSAT International Conference on Data Science in Business, Finance and Industry
January 8-10, 2023 - Da Nang, Vietnam