Cloud computing has emerged as an important computing paradigm. One critical issue is to guarantee the service quality for end users. To address this problem, we design smart algorithms in cloud computing environment to improve the performance of jobs submitted by users. At present, Our focus is to achieve a low overall job response time for the system (which is also referred to as the job flowtime) while providing fairness between small and large jobs.
Revisiting SRPT for Job Scheduling in Computing Clusters
As the scheduling principle of Shortest Remaining Processing Time (SRPT) has been proven to be optimal in the single-machine setting, it’s a natural thought that SRPT shall also be extended to yield various scheduling algorithms with theoretical performance guarantees in distributed computing clusters which consist of multiple machines. In this paper, we revisit the SRPT scheduling principle to derive new and tight competitive performance bounds with respect to the overall job flowtime. In particular, for the transient scheduling scenario where all jobs arrive at the cluster at time zero, we study two different cases and show that the SRPT-based scheduling algorithm can achieve a constant competitive ratio of at most two, compared to the prior state-of-the-art ratio of 12 in the algorithm of Moseley et al. For online scheduling, we study a special case where each job only consists of one single task and show that the online SRPT Algorithm is (1+ϵ)-speed, (3+3/ϵ)-competitive with respect to the overall job flowtime for ϵ>0 , improving the recent result of Fox and Moseley which upper bounds SRPT to be (1+ϵ)-speed, 4/ϵ-competitive.
Publications:
- Huanle Xu*, Huangting Wu*, Wing Cheong Lau, "Revisiting SRPT for Job Scheduling in Computing Clusters," the 14th International Conference on Queueing Theory and Network Applications (QTNA), August 2019.
Online Job Scheduling with Resource Packing on a Cluster of Heterogeneous Servers
Jobs in modern computing clusters have highly diverse processing durations and heterogeneous resource requirements. The job profiles in today’s computing clusters are becoming increasingly diverse as small, but latency-sensitive jobs coexist with large batch-processing applications that may take from hours to weeks to complete. To satisfy these job demands, a modern computing cluster often scales out to hundreds or even thousands of servers. Towards this end, various schedulers have been implemented and deployed by the industry. However, these schedulers are mostly designed based on heuristic arguments in order to achieve high scalability. Relatively little modeling or analytical effort is taken to characterize and optimize the performance trade-offs of the resultant design in a systematic, theoretically vigorous manner.
Publications:
- Yang Liu*, Huanle Xu*, Wing Cheong Lau, "Online Job Scheduling with Resource Packing on a Cluster of Heterogeneous Servers," IEEE Infocom, Apr 2019.
Efficient Online Resource Allocation in Heterogeneous Clusters with Machine Variability
Approximation jobs that allow partial execution of their many tasks to achieve valuable results have played an important role in today’s large-scale data analytics. This fact can be utilized to maximize the system utility of a big data computing cluster by choosing proper tasks in scheduling for each approximation job. A fundamental challenge herein, however, is that the machine service capacity may fluctuate substantially during a job’s lifetime, which makes it difficult to assign valuable tasks to well-performed machines. In addition, the cluster scheduler needs to make online scheduling decisions without knowing future job arrivals according to machine availabilities. In this paper, we tackle this online resource allocation problem for approximation jobs in parallel computing clusters. In particular, we model a cluster with heterogeneous machines as a multiarmed bandit where each machine is treated as an arm. By making estimations on machine service rates while balancing the exploration-exploitation trade-off, we design an efficient online resource allocation algorithm from a bandit perspective. The proposed algorithm extends existing online convex optimization techniques and yields a sublinear regret bound. Moreover, we also examine the performance of the proposed algorithm via extensive trace-driven simulations and demonstrate that it outperforms the baselines substantially.
Publications:
- Huanle Xu*, Yang Liu*, Wing Cheong Lau, Jun Guo, Alex X. Liu, "Efficient Online Resource Allocation in Heterogeneous Clusters with Machine Variability," IEEE Infocom, Apr 2019
Online Job Scheduling with Redundancy and Opportunistic Checkpointing
In a large-scale computing cluster, the job completions can be substantially delayed due to two sources of variability, namely, variability in the job size and that in the machine service capacity. To tackle this issue, existing works have proposed various scheduling algorithms which exploit redundancy wherein a job runs on multiple servers until the first completes. In this paper, we explore the impact of variability in the machine service capacity and adopt a rigorous analytical approach to design scheduling algorithms using redundancy and checkpointing. We design several online algorithms which can dynamically vary the number of redundant copies for jobs. We also provide new theoretical performance bounds for these algorithms in terms of the overall job flowtime by introducing the notion of a speedup function, based on which a novel potential function can be defined to enable the corresponding competitive ratio analysis. In particular, by adopting the online primal-dual fitting approach, we prove that our SRPT+R Algorithm in a non-multitasking cluster is (1+ϵ)-speed, O(1/ϵ)-competitive. We also show that our proposed Fair+R and LAPS+R(b) Algorithms for a multitasking cluster are (4+ϵ)-speed, O(1/ϵ)-competitive and (2+2β+2ϵ)-speed O(1/βϵ)-competitive respectively. We demonstrate via extensive simulations that our proposed algorithms can significantly reduce job flowtime under both the non-multitasking and multitasking modes.
Publications:
- Huanle Xu*, Gustavo De Veciana, Wing Cheong Lau, and Kunxiao Zhou. "Online Job Scheduling with Redundancy and Opportunistic Checkpointing: A Speedup-Function-Based Analysis." IEEE Transactions on Parallel and Distributed Systems 30, no. 4 (2018): 897-909. Harvard.
- Huanle Xu*, Gustavo de Veciana, and Wing Cheong Lau. "Addressing job processing variability through redundant execution and opportunistic checkpointing: A competitive analysis." In IEEE INFOCOM 2017-IEEE Conference on Computer Communications, pp. 1-9. IEEE, 2017.
Mitigating Service Variability in MapReduce Clusters via Task Cloning: A Competitive Analysis
Measurement traces from real-world production environment show that the execution time of tasks within a MapReduce job varies widely due to the variability in machine service capacity. This variability issue makes efficient job scheduling over large-scale MapReduce clusters extremely challenging. To tackle this problem, we adopt the task cloning approach to mitigate the effect of machine variability and design corresponding scheduling algorithms so as to minimize the overall job flowtime in different scenarios. For offline scheduling where all jobs arrive at the same time, we design an O(1)-competitive algorithm, which gives priorities to jobs with small effective workload. We then extend this offline algorithm to yield the so-called Smallest Remaining Effective Workload based β-fraction Sharing plus Cloning algorithm (SREW+C(β)) for the online case. We also show that SREW+C(β) is (1 + 2β + ε)-speed O(1βε)-competitive with respect to the sum of job flowtime within a cluster. We demonstrate via trace-driven simulations that SREW+C(β) can significantly reduce the overall job flowtime by cutting down the elapsed time of small jobs substantially. In particular, SREW+C(β) reduces the total job flowtime by 14, 10 and 11 percent respectively when comparing to Mantri, Dolly and Grass.
Publications:
- Huanle Xu*, Wing Cheong Lau, Zhibo Yang*, Gustavo de Veciana, Hanxu Hou, "Mitigating Service Variability in MapReduce Clusters via Task Cloning: A Competitive Analysis," in IEEE Transactions on Parallel and Distributed Systems, Vol.28, Issue 10, Oct 2017.
- Huanle Xu*, and Wing Cheong Lau. "Task-cloning algorithms in a MapReduce cluster with competitive performance bounds." In 2015 IEEE 35th International Conference on Distributed Computing Systems, pp. 339-348. IEEE, 2015.
Optimization for Speculative Execution in Big Data Processing Clusters
A big parallel processing job can be delayed substantially as long as one of its many tasks is being assigned to an unreliable or congested machine. To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the system may speculatively launch additional copies of the same task if its progress is abnormally slow when extra idling resource is available. In this paper, we focus on the design of speculative execution schemes for parallel processing clusters from an optimization perspective under different loading conditions. For the lightly loaded case, we analyze and propose one cloning scheme, namely, the Smart Cloning Algorithm (SCA) which is based on maximizing the overall system utility. We also derive the workload threshold under which SCA should be used for speculative execution. For the heavily loaded case, we propose the Enhanced Speculative Execution (ESE) algorithm which is an extension of the Microsoft Mantri scheme to mitigate stragglers. Our simulation results show SCA reduces the total job flowtime, i.e., the job delay/ response time by nearly 6 percent comparing to the speculative execution strategy of Microsoft Mantri. In addition, we show that the ESE Algorithm outperforms the Mantri baseline scheme by 71 percent in terms of the job flowtime while consuming the same amount of computation resource.
Publications:
- Huanle Xu*, Wing Cheong Lau, "Optimization for Speculative Execution in Big Data Processing Clusters," in IEEE Transactions on Parallel and Distributed Systems, Vol.28, Issue 2, Feb 2017.
- Huanle Xu*, and Wing Cheong Lau. "Optimization for speculative execution in a MapReduce-like cluster." In 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 1071-1079. IEEE, 2015.
Solving Graph Problems in MapReduce-like Frameworks via Optimized Parameter Configuration
In this paper, we propose a scheme to solve large dense graph problems under the MapReduce framework. The graph data is organized in terms of blocks and all blocks are assigned to different map workers for parallel processing. Intermediate results of map workers are combined by one reduce worker for the next round of processing. This procedure is iterative and the graph size can be reduced substantially after each round. In the last round, a small graph is processed on one single map worker to produce the final result. Specifically, we present some basic algorithms like Minimum Spanning Tree, Finding Connected Components and Single-Source Shortest Path which can be implemented efficiently using this scheme. We also offer a mathematical formulation to determine the parameters under our scheme so as to achieve the optimal running-time performance. Note that the proposed scheme can be applied in MapReduce-like platforms such as Spark. We use our own cluster and Amazon EC2 as the testbeds to respectively evaluate the performance of the proposed Minimum Spanning Tree algorithm under the MapReduce and Spark frameworks. The experimental results match well with our theoretical analysis. Using this approach, many parallelizable problems can be solved in MapReduce-like frameworks efficiently.
Publications:
- Huanle Xu*, Ronghai Yang*, Zhibo Yang*, and Wing Cheong Lau. "Solving Large Graph Problems in MapReduce-Like Frameworks via Optimized Parameter Configuration." In International Conference on Algorithms and Architectures for Parallel Processing, pp. 525-539. Springer, Cham, 2015.
- Huanle Xu*, Ronghai Yang*, Zhibo Yang* and Wing Cheong Lau, "Solving Graph Problems in MapReduce-like Frameworks via Optimized Parameter Configuration," The 15th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), Zhangjiajie, China, Nov. 2015.