Home| Contact Us| New Journals| Browse Journals| Journal Prices| For Authors|

Print ISSN: 0976-9005
Online ISSN:
0976-9013


  About JIC
  DLINE Portal Home
Home
Aims & Scope
Editorial Board
Current Issue
Next Issue
Previous Issue
Sample Issue
Upcoming Conferences
Self-archiving policy
Alert Services
Be a Reviewer
Publisher
Paper Submission
Subscription
Contact us
 
  How To Order
  Order Online
Price Information
Request for Complimentary
Print Copy
 
  For Authors
  Guidelines for Contributors
Online Submission
Call for Papers
Author Rights
 
 
RELATED JOURNALS
Journal of Digital Information Management (JDIM)
International Journal of Computational Linguistics Research (IJCL)
International Journal of Web Application (IJWA)

 

 
Journal of Intelligent Computing
 

Modified Balanced Random Forest (MBRF) Algorithm for Classifying Imbalanced Data
Zahra Putri Agusta, Adiwijaya
Surya University, Jln MH Thamrin Km 27 Tangerang, 15143, Indonesia., Telkom University, Jl Telekomunikasi no 1 Bandung, 40257, Indonesia
Abstract: Customer churn prediction is a method that companies use to anticipate loss in revenue. Some data mining classification techniques can be used to predict customer churn. However, these techniques could become less optimal when faced with imbalanced data conditions. Customer churn data has imbalanced data characteristics, so a process that can handle imbalanced data is required. There are two approaches that can solve these problems, namely sampling method (distribution of training data is modified so that two classes of data can be balanced) and algorithm approach (algorithm process is modified to handle imbalanced data). This paper used the algorithm approach because the consistency of original data distribution will be kept the same as the training data. This will provide more valid data and prediction results that can better represent real conditions. In line with this, we proposed a Modified Balanced Random Forest (MBRF) algorithm as a classification technique to address imbalanced data. The MBRF process changes the process in a Balanced Random Forest by applying an undersampling strategy based on clustering techniques for each data bootstrap decision tree in the Random Forest algorithm. The proposed MBRF method yielded better performance compared to the Balanced Random Forest (BRF) and Random Forest (RF) algorithms, with a sensitivity value or true positive rate (TPR) of 88%, a specificity or true negative rate (TNR) of 94%, and the best AUC accuracy value of 91.65%. Moreover, MBRF also reduced process running time.
Keywords: Imbalanced Data, Random Forest Algorithm, Balanced Random Forest, Customer Churn, Classification Technique, Machine Learning Modified Balanced Random Forest (MBRF) Algorithm for Classifying Imbalanced Data
DOI:https://doi.org/10.6025/jic/2020/11/2/41-51
Full_Text   PDF 269 KB   Download:   168  times
References:

[1] Khan, A. A., Jamwal, S., Sepehri, M. M. (2012). Applying Data Mining to Customer Churn Prediction in an Internet ServiceProvider, Int. J. Comput. Appl., 2010.
[2] Adebiyi, S. O., Oyatoye, E. O., Amole, B. B. (2016). Relevant Drivers for Customers‘ Churn and Retention Decision in the Nigerian Mobile Telecommunication Industry, J. Compet., 2016.
[3] Umayaparvathi, V., Iyakutti, K. (2016). A Survey on Customer Churn Prediction in Telecom Industry: Datasets, Methods and Metrics, Int. Res. J. Eng. Technol., p 2395–56, 2016.
[4] Dalvi, P. K., Khandge, S. K., Deomore, A., Bankar, A., Kanade, V. A. (2016). Analysis of Customer Churn Prediction in Telecom Industry using Decision Trees and Logistic Regression, Symp. Colossal Data Anal. Netw., 2016.
[5] Sonak, A., Patankar, R. A. (2015). A Survey on Methods to Handle Imbalance Dataset., 4, (11), p 338–343.
[6] Bekkar, M., Djemaa, H. K., Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets, J. Inf. Eng. Appl., 2013.
[7] Breiman, L. (2001). Random forests, Mach. Learn., 2001.
[8] Esteves, G., and Mendes-Moreira, J. (2016). Churn perdiction in the telecom business, in 2016 11th International Conference on Digital Information Management, ICDIM 2016, 2016.
[9] Wu, Z., Lin, W., Zhang, Z., Wen, A., Lin, L. (2017). An Ensemble Random Forest Algorithm for Insurance Big Data Analysis, in Proceedings - 2017 IEEE International Conference on Computational Science and Engineering and IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017.
[10] Khalilia, M., Chakraborty, S., Popescu, M. (2011). Predicting disease risks from highly imbalanced data using random forest,BMC Med. Inform. Decis. Mak., 2011.
[11] Effendy, V., Baizal, Z. K. a. (2014). Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest, 2014 2nd Int. Conf. Inf. Commun. Technol., 2014.
[12] Dwiyanti, E., Adiwijaya, Ardiyanti, A. (2017). Handling imbalanced data in churn prediction using RUSBoost and feature selection (Case study: PT. Telekomunikasi Indonesia regional 7), in Advances in Intelligent Systems and Computing, 2017.
[13] Kobyli Dski, A., Przepiórkowski, A. (2008). Definition extraction with balanced random forests, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008.
[14] Singh, S., Gupta. (2014). Comparative study ID3, cart and C4 . 5 Decision tree algorithm: a survey, Int. J. Adv. Inf. Sci. Technol., 2014.
[15] Chen, C., Liaw, A., Breiman, L. (2004). Using random forest to learn imbalanced data, Univ. California, Berkeley, 2004.
[16] Ghosh, S., Kumar, S. (2013). Comparative Analysis of K-Means and Fuzzy C-Means Algorithms, Int. J. Adv. Comput. Sci. Appl..
[17] Oyelade, O. J., Oladipupo, O. O., Obagbuwa, I. C. (2010). Application of k Means Clustering algorithm for prediction of Students Academic Performance, Int. J. Comput. Sci. Inf. Secur., 2010.
[18] Weng, C. G., Poon, J. (2008). A new evaluation measure for imbalanced datasets, Conf. Res. Pract. Inf. Technol. Ser., 2008.
[19] Fawcett, T. (2006). An introduction to ROC analysis, Pattern Recognit. Lett., 2006.
[20] Kotsiantis, S. B., Kanellopoulos, D., Pintelas, P. E. (2006). Data preprocessing for supervised learning, Int. J. Comput. Sci., 2006.


Home | Aim & Scope | Editorial Board | Author Guidelines | Publisher | Subscription | Previous Issue | Contact Us |Upcoming Conferences|Sample Issues|Library Recommendation Form|

 

Copyright © 2011 dline.info