Progress in Computing Applications(PCA)

DLINE Journals portal

Home

New Journals

Browse Journals

Journal Prices

For Authors

Print ISSN:
Online ISSN:

About PCA
	DLINE Portal Home Home Aims & Scope Editorial Board Current Issue Next Issue Previous Issue Sample Issue Upcoming Conferences Self-archiving policy Alert Services Be a Reviewer Publisher Paper Submission Subscription Contact us

How To Order
	Order Online Price Information Request for Complimentary Print Copy

For Authors
	Guidelines for Contributors Online Submission Call for Papers Author Rights

RELATED JOURNALS

Journal of Digital Information Management (JDIM)

Journal of Multimedia Processing and Technologies (JMPT)

International Journal of Web Application (IJWA)

Progress in Computing Applications(PCA)

SQL Hadoop Processing Engineers using MapReduce

Edson Ramiro Lucas Filho1, Eduardo Cunha de Almeida, Stefanie Scherzinger
Universidade Federal do Parana, Brazil., OTH Regensburg

Abstract: SQL-on-Hadoop processing engines have become state-of-the art, yet the skills required to tune these systems are rare in the job market. Automated tuning advisers can profile the low-level MapReduce jobs and propose appropriate tuning setups, but up-front tuning is time consuming and costly. In this demo, we present DejaVu. DejaVu integrates with Hive and effectively reduces the tuning costs by caching tuning setups for partial query plans: When the SQLon-Hadoop engine Hive compiles SQL queries into physical query plans, single MapReduce jobs tend to be similar between query plans. By recycling the tuning setups for similar low-level MapReduce jobs, DejaVu can effectively cut down the time spent profiling the TPC-H query workload in half, achieving similar impact on the performance of the jobs. While we employ Starfish in this demo, DejaVu can leverage any third-party MapReduce tuning adviser.

Keywords: MapReduce, SQL Queries, Hadoop Processing SQL Hadoop Processing Engineers using MapReduce

DOI:https://doi.org/10.6025/pca/2020/9/1/1-5

Full_Text PDF 624 KB Download: 365 times

References:

[1] Dean, J., Ghemawat, S. (2004). MapReduce: Simplified Data Processing on Large Clusters. In: OSDI.
[2] Duan, S., Thummala, V., Babu, S. (2009). Tuning Database Configuration Parameters with iTuned. ReCALL 2(1), 1246–1257 (aug).
[3] Filho, E. R. L., de Almeida, E.C., Scherzinger, S. (2019). Don’t Tune Twice: Reusing Tuning Setups for SQL-on-Hadoop Queries. In: ER 2019 – 38th International Conference on Conceptual Modeling.
[4] Filho, E. R. L., Picoli, I. L., de Almeida, E.C., Le Traon, Y., Chameleon. (2014). The Performance Tuning Tool for MapReduce Query Processing Systems. In: 29th SBBD – Demos and Applications Session – ISSN 2316-5170 October 6-9, 2014 – Curitiba, PR,
Brazil.
[5] Floratou, A., Minhas, U. F., Ozcan, F. (2014). SQL-on-Hadoop: full circle back to shared-nothing database architectures. Proceedings of the VLDB Endowment, 7(12), 1295–1306.
[6] Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., Babu, S. Starfish: A Self-Tuning System for Big Data Analytics. In: CIDR.
[7] Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R. (2010). Hive - A petabyte scale data warehouse using hadoop. In: Proceedings - International Conference on Data Engineering. p 996–1005.
[8] Yanpei Chen, S. A., Katz, R. H., Chen, Y., Alspaugh, S., Katz, R. (2012). Interactive Query Processing in Big Data Systems: A Cross Industry Study of MapReduce Workloads. Tech. Rep. 12, University of California, Berkeley.

DLINE Journals portal