Automated Process for Data Acquisition, Analysis, and Preprocessing  
Author Rosemarie Day


Co-Author(s) Alexander Shoop; Adarsh Jaiswal; Jack Zhang; Xiao Du; Manasee Godsay; Fatemeh Emdad; Chun-Kit Ngan; Elke Rundensteiner


Abstract The objective of this research was to create libraries that will enhance the automation of data acquisition, analysis and pre-processing for integration with the Findability Platform®. During this process, four libraries were created using Java, Spark, Cassandra and the Maven build tool to collect data from various sources, analyze the data coming in, and transform the data based on the information obtained. The data acquisition phase allowed for the import of structured data into the Cassandra data store with support for Excel and Text files such as CSV. Once imported, the analysis phase collects relevant statistical data about the table such as outliers, missing values, and unique values. Using the statistical data and the original dataset, the preprocessing phase performs feature generation to create a final modified dataset. This final dataset can be then sent off for modeling and predictions.


Keywords Data Acquisition, Data Analysis, Data Preprocessing, Feature Generation, Artificial Intelligence, Big Data
    Article #:  DSIS19-1
Proceedings of ISSAT International Conference on Data Science & Intelligent Systems
August 1-3, 2019 - Las Vegas, NV, U.S.A.