Journal of Intelligent Computing

DLINE Journals portal

Home

New Journals

Browse Journals

Journal Prices

For Authors

Print ISSN: 0976-9005
Online ISSN: 0976-9013

About JIC
	DLINE Portal Home Home Aims & Scope Editorial Board Current Issue Next Issue Previous Issue Sample Issue Upcoming Conferences Self-archiving policy Alert Services Be a Reviewer Publisher Paper Submission Subscription Contact us

How To Order
	Order Online Price Information Request for Complimentary Print Copy

For Authors
	Guidelines for Contributors Online Submission Call for Papers Author Rights

RELATED JOURNALS

Journal of Digital Information Management (JDIM)

International Journal of Computational Linguistics Research (IJCL)

International Journal of Web Application (IJWA)

Journal of Intelligent Computing

Modified Balanced Random Forest (MBRF) Algorithm for Classifying Imbalanced Data

Zahra Putri Agusta, Adiwijaya
Surya University, Jln MH Thamrin Km 27 Tangerang, 15143, Indonesia., Telkom University, Jl Telekomunikasi no 1 Bandung, 40257, Indonesia

Abstract: Customer churn prediction is a method that companies use to anticipate loss in revenue. Some data mining classification techniques can be used to predict customer churn. However, these techniques could become less optimal when faced with imbalanced data conditions. Customer churn data has imbalanced data characteristics, so a process that can handle imbalanced data is required. There are two approaches that can solve these problems, namely sampling method (distribution of training data is modified so that two classes of data can be balanced) and algorithm approach (algorithm process is modified to handle imbalanced data). This paper used the algorithm approach because the consistency of original data distribution will be kept the same as the training data. This will provide more valid data and prediction results that can better represent real conditions. In line with this, we proposed a Modified Balanced Random Forest (MBRF) algorithm as a classification technique to address imbalanced data. The MBRF process changes the process in a Balanced Random Forest by applying an undersampling strategy based on clustering techniques for each data bootstrap decision tree in the Random Forest algorithm. The proposed MBRF method yielded better performance compared to the Balanced Random Forest (BRF) and Random Forest (RF) algorithms, with a sensitivity value or true positive rate (TPR) of 88%, a specificity or true negative rate (TNR) of 94%, and the best AUC accuracy value of 91.65%. Moreover, MBRF also reduced process running time.

Keywords: Imbalanced Data, Random Forest Algorithm, Balanced Random Forest, Customer Churn, Classification Technique, Machine Learning Modified Balanced Random Forest (MBRF) Algorithm for Classifying Imbalanced Data

DOI:https://doi.org/10.6025/jic/2020/11/2/41-51

Full_Text PDF 269 KB Download: 168 times

References:

[1] Khan, A. A., Jamwal, S., Sepehri, M. M. (2012). Applying Data Mining to Customer Churn Prediction in an Internet ServiceProvider, Int. J. Comput. Appl., 2010.
[2] Adebiyi, S. O., Oyatoye, E. O., Amole, B. B. (2016). Relevant Drivers for Customers‘ Churn and Retention Decision in the Nigerian Mobile Telecommunication Industry, J. Compet., 2016.
[3] Umayaparvathi, V., Iyakutti, K. (2016). A Survey on Customer Churn Prediction in Telecom Industry: Datasets, Methods and Metrics, Int. Res. J. Eng. Technol., p 2395–56, 2016.
[4] Dalvi, P. K., Khandge, S. K., Deomore, A., Bankar, A., Kanade, V. A. (2016). Analysis of Customer Churn Prediction in Telecom Industry using Decision Trees and Logistic Regression, Symp. Colossal Data Anal. Netw., 2016.
[5] Sonak, A., Patankar, R. A. (2015). A Survey on Methods to Handle Imbalance Dataset., 4, (11), p 338–343.
[6] Bekkar, M., Djemaa, H. K., Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets, J. Inf. Eng. Appl., 2013.
[7] Breiman, L. (2001). Random forests, Mach. Learn., 2001.
[8] Esteves, G., and Mendes-Moreira, J. (2016). Churn perdiction in the telecom business, in 2016 11th International Conference on Digital Information Management, ICDIM 2016, 2016.
[9] Wu, Z., Lin, W., Zhang, Z., Wen, A., Lin, L. (2017). An Ensemble Random Forest Algorithm for Insurance Big Data Analysis, in Proceedings - 2017 IEEE International Conference on Computational Science and Engineering and IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017.
[10] Khalilia, M., Chakraborty, S., Popescu, M. (2011). Predicting disease risks from highly imbalanced data using random forest,BMC Med. Inform. Decis. Mak., 2011.
[11] Effendy, V., Baizal, Z. K. a. (2014). Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest, 2014 2nd Int. Conf. Inf. Commun. Technol., 2014.
[12] Dwiyanti, E., Adiwijaya, Ardiyanti, A. (2017). Handling imbalanced data in churn prediction using RUSBoost and feature selection (Case study: PT. Telekomunikasi Indonesia regional 7), in Advances in Intelligent Systems and Computing, 2017.
[13] Kobyli Dski, A., Przepiórkowski, A. (2008). Definition extraction with balanced random forests, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008.
[14] Singh, S., Gupta. (2014). Comparative study ID3, cart and C4 . 5 Decision tree algorithm: a survey, Int. J. Adv. Inf. Sci. Technol., 2014.
[15] Chen, C., Liaw, A., Breiman, L. (2004). Using random forest to learn imbalanced data, Univ. California, Berkeley, 2004.
[16] Ghosh, S., Kumar, S. (2013). Comparative Analysis of K-Means and Fuzzy C-Means Algorithms, Int. J. Adv. Comput. Sci. Appl..
[17] Oyelade, O. J., Oladipupo, O. O., Obagbuwa, I. C. (2010). Application of k Means Clustering algorithm for prediction of Students Academic Performance, Int. J. Comput. Sci. Inf. Secur., 2010.
[18] Weng, C. G., Poon, J. (2008). A new evaluation measure for imbalanced datasets, Conf. Res. Pract. Inf. Technol. Ser., 2008.
[19] Fawcett, T. (2006). An introduction to ROC analysis, Pattern Recognit. Lett., 2006.
[20] Kotsiantis, S. B., Kanellopoulos, D., Pintelas, P. E. (2006). Data preprocessing for supervised learning, Int. J. Comput. Sci., 2006.

DLINE Journals portal