A Framework for Implementing Prediction Algorithm over Cloud Data as a Procedure for Cloud Data Mining

Authors

  • Safwan A. S. Al-Shaibani School of Computational Sciences S.R.T.M.University ,Nanded, MS, 431606, India
  • Parag Bhalchandra School of Computational Sciences S.R.T.M.University ,Nanded, MS, 431606, India

DOI:

https://doi.org/10.54060/JIEEE/002.02.021

Keywords:

Heroku cloud, CatBoost algorithm, prediction model, binary Classification

Abstract

The cloud has become an important phrase in data storage for many reasons. Cloud services and applications are widespread in many industries including healthcare due to easy access. The limitless quantity of data available on the clouds has triggered the interest of many researchers in the recent past. It has forced us to deploy machine learning for analyzing the data to get insights as well as model building. In this paper, we have built a service on Heroku Cloud which is a cloud platform as a service (PaaS) and has 15 thousand records with 25 features. The data belongs to healthcare and is related to post-surgery complications. The boost prediction algorithm was applied for analysis and implementation was done in python. The results helped us to determine and tune some of the hyperparameters which have correlations with complications and the reported accuracy of training and testing was found to be 91% and 88% respectively.

Downloads

Download data is not yet available.

References

D.K. Sokol, J. Wilson, “What is a surgical complication?” World journal of sur¬gery, vol.32,no.6,pp.942-944, 2008.‏

R. Bhardwaj, A.R. Nambiar, & D. Dutta, “A study of machine learning in healthcare.,” In IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 236-241,2017.‏

M. Ramaswami and R. Bhaskaran, “A CHAID based performance prediction model in educational data mining,” 2010.‏

A. Zheng, and C. Amanda,” Feature engineering for machine learning: principles and techniques for data scientists. " O'Reilly Media, Inc.", 2018.

R. Abdulhammed, M. Faezipour, A. Abuzneid et al., "Effective Features Selection and Machine Learning Classifiers for Improved Wireless Intrusion Detection," International Symposium on Networks, Computers and Communications (ISNCC), pp. 1-6, 2018.‏

X. H. Cao, I. Stojkovic, and Z. Obradovic, “A robust data scaling algorithm to improve classification accuracies in biomedical data,” BMC Bioinformatics, vol. 17, no. 1, p. 359, 2016.

J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,” Inf. Sci. (Ny), vol. 507, pp. 772–794, 2020.‏‏

M. C. Sachs, “PlotROC: A tool for plotting ROC curves,” J. Stat. Softw., vol. 79, no. Code Snippet 2, 2017.‏

G. Huang, L. Wu, X. Ma, et al., “Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions,” J. Hydrol. (Amst.), vol. 574, pp. 1029–1041, 2019.

L. Prokhorenkova, G. Gusev, A. Vorobev, et al., CatBoost: unbiased boosting with categorical features. In Advances in neural information processing sys-tems pp. 6638-6648,2018.‏‏

H. Li, H. Huang, and Z. Zheng., "Research on Credit Risk of P2P Lending Based on Cat-Boost Algorithm." vol.9, no.3, pp.137-141,2019.‏

J. P. Craig, K. K. Nichols, E. K. Akpek et al., “TFOS DEWS II definition and classification report,” Ocul. Surf., vol. 15, no. 3, pp. 276–283, 2017.‏

P.K. Das, N. Sinha, & B. Annappa,” Data privacy preservation using aes-gcm encryption in Heroku cloud (No. 2615),” EasyChair, pp.1-8, 2020.‏

B. H. Lee, E. K. Dewi and M. F. Wajdi, "Data security in cloud computing using AES under HEROKU cloud," 27th Wireless and Optical Communication Conference (WOCC), pp. 1-5,2018.‏

L. Breiman, “Arcing the edge,” Technical Report 486, Statistics Department, University of California at Berkeley, pp.1-14,1997.‏

A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” arXiv:1810.11363,2018.‏

A. Malakhov, F. Goncharov and E. Gryazina, "Testing machine learning approaches for wind plants power output,"International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE), pp. 1-6,2019.

X. H. Cao, I. Stojkovic, and Z. Obradovic, “A robust data scaling algorithm to improve classification accuracies in biomedical data,” BMC Bioinformatics, vol. 17, no. 1, p. 359, 2016.‏

J. Han, M. Kamber, & J. Pei,” Data mining: concepts and techniques 3rd edn,” Morgan Kaufmann ,2011.

S. SHaykin, “Neural networks and learning machines 3rd edn,” Simon Haykin. Prentice hall, pp.1-917, 2009.

Downloads

Published

2021-06-04

How to Cite

[1]
S. Safwan A. S. Al-Shaibani1 and P. Bhalchandra, “A Framework for Implementing Prediction Algorithm over Cloud Data as a Procedure for Cloud Data Mining”, J. Infor. Electr. Electron. Eng., vol. 2, no. 2, pp. 1–8, Jun. 2021.

CITATION COUNT