Survey of Apache spark optimized job scheduling in big data

Document Type : Original Article

Authors

Computer Science and engineering dept. Faculty of Electronics Engineering, Menofiua university, Egypt

Abstract

Big data have acquired big attention in recent years. As big data makes its way into companies and business so there are some challenges in big data analytics.  Apache spark framework becomes very popular for using in distributed data processing. Spark is an analytic machine for big data processing with various modules for SQL, streaming, graph processing and machine learning. Different scheduling algorithms vary with its behavior, design and also the goal required solving a problem like data locality, energy and time. The main goal in this research is to represent a comprehensive survey on job scheduling modes using in spark, the types of different scheduler, and existing algorithms with advantages and issues. In this paper, various adaptive ways to schedule jobs on spark and development algorithms to improve performance in Spark will be discussed, analyzed and evaluated. A comparison between different scheduling algorithms, strength and weakness points of them are provided. This can aid to the researchers understanding of which scheduling mechanisms best applied for Big Data.

Keywords