International Journal of Computational Linguistics Research

DLINE Journals portal

Home

New Journals

Browse Journals

Journal Prices

For Authors

Print ISSN: 0976-416X
Online ISSN: 0976-4178

About IJCLR
	DLINE Portal Home Home Aims & Scope Editorial Board Current Issue Next Issue Previous Issue Sample Issue Upcoming Conferences Self-archiving policy Alert Services Be a Reviewer Publisher Paper Submission Subscription Contact us

How To Order
	Order Online Price Information Request for Complimentary Print Copy

For Authors
	Guidelines for Contributors Online Submission Call for Papers Author Rights

RELATED JOURNALS

Journal of Digital Information Management (JDIM)

Journal of Multimedia Processing and Technologies (JMPT)

International Journal of Web Application (IJWA)

International Journal of Computational Linguistics Research

Securing MapReduce Programming Paradigm in Hadoop, Cloud and Big Data Ecosystem

Anitha Patil
Department of Computer Engineering Pillai HOC College of Engineering and Technology Rasayani, India

Abstract: In the wake of technologies like cloud computing, virtualization and big data, MapReduce is the new programming paradigm used for processing voluminous data known as big data. MapReduce computations take place in thousands of commodity computers associated with cloud. Thus it can exploits Graphics Processing Units (GPUs) associated with cloud with its parallel processing abilities. Enterprises in the real world are shifting from traditional computing to cloud computing and traditional data mining to big data analytics. The rationale behind this is the exponential growth of data. Storing and processing such data needs big data eco-system ssociated with cloud computing. In this context, MapReduce programming model is supported by distributed programming frameworks like Hadoop. However, it is very challenging to secure MapReduce computations from malicious attacks. In the literature many secure cloud storage mechanisms are found. However, securing MapReduce programming paradigm in Hadoop and big data eco-system is still to be explored. In this paper, we proposed an algorithm based on differential privacy to protect big data from malicious Mapper and Reducer. We built a prototype application to demonstrate proof of the concept. The result showed the utility of the proposed approach.

Keywords: Big Data, MapReduce Programming, Hadoop, HDFS Securing MapReduce Programming Paradigm in Hadoop, Cloud and Big Data Ecosystem

DOI:https://10.6025/jcl/2020/11/3/87-96

Full_Text PDF 473 KB Download: 21 times

References:

[1] Wanga, Lizhe., Taoc, Jie., Rajiv Ranjan, D., Martenc, Holger., Streit, Achim, C., Jingying Chene., Dan Chena. (2013). GHadoop: MapReduce across distributed data centers for data-intensive computing, IEEE, 2013, p. 1-14.
[2] Zhaoa, Jiaqi., Wangb, Lizhe., Taoc, Jie., Chend, Jinjun., Sunc, Weiye., Ranjane, Rajiv., Koodziejf, Joanna. (2014). Achim Streitc and Dimitrios Georgakopoulose, A security framework in G-Hadoop for big data computing across distributed Cloud data centres, Journal of Computer and System Sciences, 2014, p. 1-14.
[3] Xavier, Miguel G., Neves, Marcelo V.., De Rose, Cesar A. F.(2014). A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters, ACM, 2014, p1-9.
[4] Katal, Avita., Wazid, Mohammad., Goudar, R H. (2014). Big Data: Issues, Challenges, Tools and Good Practice, IEEE, p. 1-6.
[5] Pradhananga, Yanish., Karande, Shridevi., Chandraprakash Karande. (2016). High Performance Analytics of Bigdata with Dynamic and Optimized Hadoop Cluster, IEEE, p1-7.
[6] Fernandez, Alberto., Rio, Sara del., Lopez, Victoria., Bawakid, Abdullah., del Jesus, Maria J., Benítez, Jose M., Herrera., Francisco (2014). Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks, ACM, p. 1-31.
[7] Vavilapallih, Vinod Kumar., Murthyh, Arun C., Douglasm, Chris., Agarwali, Sharad., Konarh, Mahadev., Evansy, Robert., Gravesy, Thomas., Lowey, Jason., Shahh, Hitesh., Sethh, Siddharth., Sahah, Bikas., Curinom, Carlo., San, Owen O’Malleyh. (2013). Apache Hadoop YARN: Yet Another Resource Negotiator, ACM, p. 1-16.
[8] Kumar, Amresh., Kiran, M., Mukherjee, Saikat., Ravi Prakash, G. (2013). Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster, International Journal of Computer Applications, 72, 2013, p 1-8.
[9] Grolinger, Katarina., Hayes, Michael., Higashino, Wilson A., L’Heureux, Alexandra., Allison, David, S., Miriam, A. M. Capretz. (2014). Challenges for MapReduce in Big Data, IEEE, 2014, p 1-10.
[10] Karthik Kambatla., Giorgos Kollias., Vipin Kumar., Ananth Grama. (2014). Trends in big data analytics, IEEE, 2014, p1- 13.
[11] Miller, John A., Bowman, Casey. (2016). Vishnu Gowda Harish and Shannon Quinn, Open Source Big Data Analytics Frameworks Written in Scala, IEEE, 2016, p. 1-5.
[12] Poornima, Mythreyee., Purohit, S., Apoorva, D.R. (2017). A Study on Use of Big Data in Cloud Computing Environment, IJARIIT, 2017, p1-7.
[13] Win, Ngu Wah., Thein, Thandar. (2015). An Efficient Big Data Analytics Platform for Mobile Devices, IJCSIS, 2015, p1-5.
[14] Erkang Chenga., Liya Maa., Adam Blaissea., Erik Blaschb., Carolyn Sheaffb., Genshe Chenc., Jie Wua., Haibin Linga., Efficient Feature Extraction from Wide Area Motion Imagery by MapReduce in Hadoop, ACM, 2015, p1-9.
[15] Pakize, Seyed Reza (2014). A Comprehensive View of Hadoop MapReduce Scheduling Algorithms, ijcncs, p1-10.
[16] Harshawardhan S. Bhosale., Devendra., Gadekar, P. (2014). A Review Paper on Big Data and Hadoop, ijsrp, p1-7.
[17] Zhao, Yaxiong., Wu, Jie ., Liu, Cong. (2014). Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework, Tsinghua Science and Technology, p. 1-12.
[18] Huang, Jingwei., Nicol, David M., Campbell, Roy H. (2014). Denial-of-Service Threat to Hadoop/YARN Clusters with Multi-Tenancy, IEEE, p. 1-8.
[19] Lee, Yeonhee., Lee, Youngseok. (2013). Toward Scalable Internet Traffic Measurement and Analysis with Hadoop, ACM, 2013, p. 1-8.
[20] Zaharia, Matei., Xin, Reynold S., Wendell, Patrick., Das, Tathagata., Armbrust, Michael., Dave, Ankur., Xiangrui Meng., Rosen, Josh., Venkataraman, Shivaram., Franklin, Michael J., Ghodsi, Ali., Gonzalez, Joseph., Scot. (2016). Apache Spark: A Unified Engine for Big Data Processing, ACM, 59, 2016, p 1-10.
[21] Sharma, Priya P., Navdeti, Chandrakant P (2014). Securing Big Data Hadoop: A Review of Security Issues, Threats and Solution, IJCSIT, 5, p. 1-6.
[22] Vasconcelos, Pedro Roger Magalhaes., Freitas, Gisele Azevedo de Araujo. (2014). Performance Analysis of Hadoop MapReduce on an Open Nebula Cloud with KVM and OpenVZ Virtualizations, ICITST, 2014, p1-7.
[23] Siddique, Kamran., Akhtar, Zahid., Yoon, Edward J., Jeong, Young-Sik., Dasgupta., I., Dipankar Kim, Yangwoo. (2016). Apache Hama: An Emerging Bulk Synchronous Parallel Computing Framework for Big Data Applications, IEEE, 4 , 2016, p1-9.
[24] Assuncaoa, Marcos D., Calheirosb, Rodrigo N., Bianchic, Silvia., Nettoc, Marco A S., Buyyab, Rajkumar. (2014). Big Data Computing and Clouds: Trends and Future Directions, ACM, p1-44. 96 International Journal of Computational Linguistics Research Volume 11 Number 3 September 2020
[25] Gupta, Arpit., Pandey, Rajiv., Verma, Komal. (2015). Analysing Distributed Big Data through Hadoop Map Reduce, IEEE, 129, 2015, p 1-7.
[26] Idrissi, Abdellah., Abourezq, Manar. (2015). Skyline In Cloud Computing, Journal of Theoretical and Applied Information Technology, 60 (3)1-12.
[27] Lemoudden, M., Ben Bouazza, N., El Ouahidi, B., Bourget, D. (2013). A Survey of Cloud Computing Security Overview of Attack Vectors and Defense Mechanisms, Journal of Theoretical and Applied Information Technology, 54 (2) 2013, p.1-6.
[28] Sudha, V., Madhu Viswanatham. (2013). Addressing Security and Privacy Issues in Cloud Computing, Journal of Theoretical and Applied Information Technology, 48 (2) p 1-13.
[29] Fayoumi, Ayman G . (2011). Performance Evaluation of a Cloud Based Load Balancer Severing Pareto Traffic, Journal of Theoretical and Applied Information Technology, 32 (1) p 1-7.
[30] Kumar, P., Sheila Anand. (2013). An Approach To Optimize Workflow Scheduling For Cloud Computing Environment, Journal of Theoretical and Applied Information Technology, 57 (3) 1-7.
[31] Ratna Sari., Yohannes Kurniawan. (2015). Cloud Computing Technology Infrastructure To Support The Knowledge Management Process, Journal of Theoretical and Applied Information Technology, 73 (3) 1-6.
[32] Simamora, Bachtiar H., Sarmedy, Julirzal., Kom, S. (2015). Improving Services Through Adoption Of Cloud Computing At Pt Xyz In Indonesia, Journal of Theoretical and Applied Information Technology, 73 (3) 1-10.
[33] Suresh Kumar, V., Aramudhan. (2014). Hybrid Optimized List Scheduling and Trust Based Resource Selection In Cloud Computing, Journal of Theoretical and Applied Information Technology, 69 (3), p 1-9.
[34] Ghani, Imran., Niknejad, Naghmeh., Jeong, Seung Ryul. (2015). Energy Saving in Green Cloud Computing Data Centers: A Review, Journal of Theoretical and Applied Information Technology, 74 (1) 1-16.
[35] Manongga, Danny., Utomo, Wiranto., Herry., Hendry. (2014). E-Learning Development as Public Infrastructure Of Cloud Computing, Journal of Theoretical and Applied Information Technology, 62 (1) 1-6.
[36] Wu, Xindong., Zhu, Xingquan., Wu, Gong-Qing. (2014). Data Mining with Big Data, IEEE, 26 (1), 97-107.
[37] Philip Chen, C. L. (2015). Chun-Yang Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on Big Data, Elsevier, p. 32-44.
[38] Agrawal, R., Srikant. (2000). Privacy-Preserving Data Mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data. Dallas, Texas, ACM SIGMOD International Conference on Management of Data, 2000, p 439-450.
[39] Securities and Exchange Commission. (2016). EDGAR Log File Data Set, Available: https://www.sec.gov/data/edgarlog-file-data-set. Last accessed 10 November.

DLINE Journals portal