研究源自于MapReduce系统的一类排序问题。给定两台恒速机和一组按列表到达的工件,每个工件包含两类任务:Map Task和Reduce Task。假设Map任务和Reduce任务都是不可中断的,Map任务可以并行处理,即可以任意分割成若干小的任务并在两台机器上同时处理,而Reduce任务只可以在单台机器上处理。一旦工件到达,必须为其指派机器和开工时间,目标是使得最后完工时间最小。对LSc算法的竞争比进行了分析,得到其一般情形下的竞争比当$s\geq(1+\sqrt{5})/2$时为$1+1/s$,否则为$1+s/(s+1)$。而当每个工件$J_j$都满足其Map任务总长大于等于Reduce任务总长时,其竞争比当$s\geq(1+\sqrt{3})/2$时不超过$1+1/(2s)$,否则为不超过$1+s/(2s+1)$。
In this paper, we investigate a class of online over-list scheduling problem in MapReduce system. We are given two uniform machines, and a list of jobs arrive one by one, each job consists of one Map task and one Reduce task. The map task can be arbitrarily split and processed on both machines simultaneously, while the reduce task has to be processed on a single machine. After a job arrived, we should assign a(some) machine(s) to process its map and reduce task, which aims to minimize the makespan. We show that the classical LSc algorithm has competitive ratio $1+1/s$ when $s\geq (1+\sqrt{5})/2$ and $1+s/(s+1)$, otherwise. If the length of Map task is greater than or equal to the length of Reduce task for each job, then the competitive ratio is at most $1+1/(2s)$ when $s\geq (1+\sqrt{3})/2$ and $1+s/(2s+1)$ otherwise.
[1] Benchini A, Marcelloni F, Segatori A. A MapReduce solution for associative classification of big data[J]. Information Sciences, 2015, 332:33-35.
[2] Kolb L, Thor A, Rahm E. Load balancing for MapReduce-based entity resolution[C]//Proceeding of the 28th International Conference onData Engineering(ICDE), 2012:618-629.
[3] Chen C, Xu Y, Zhu Y, et al. Online MapReduce scheduling problem of minimizing the makespan[J]. Journal of Combinatorial Optimization, 2017, 33:590-608.
[4] Huang J, Zheng F, Xu Y, et al. Online MapReduce processing on two identical parallel machines[J]. Journal of Combinatorial Optimization, 2018, 35(1):216-223.
[5] Zaharia M, Konwinski A, Joseph A, et al. Improving MapReduce performance in heterogeneous environments[C]//Proceeding of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI 08), 2008, 29-42.
[6] Moseley B, Dasgupta A, Kumar R, et al. On scheduling in MapReduce and flow-shops[C]//Proceedings of the twenty-third annual ACM symposium on parallelism in algorithms and architectures, 2011, 11:289-298.
[7] Luo T, Zhu Y, Wu W, et al. Onlinemakespan minimization in MapReduce-like systems withcomplex reduce tasks[J]. Optimization Letter,2017, 11:271-277.
[8] Zhu Y, Jiang Y, Wu W, et al. Minimizingmakespan and total completiontime in MapReducelike systems[C]//INFOCOM, IEEE, 2014, 2166-2174.
[9] Wang J, Li X. Task scheduling for MapReduce in heterogeneous networks[J]. Cluster Computing, 2016, 19(1):197-210.
[10] Jiang Y, Zhu Y, Wu W, et al. Makespan minimization for MapReduce systems with different severs[J]. Future Generation Computer Systems, 2017, 67:13-21.
[11] Epstein L, Noga J, Seiden S, et al. Woeginger G.Randomized on-line scheduling on two uniform machines[J]. Journal of Scheduling, 2001, 4(2):71-92.