Towards A Robust Meta-Reinforcement Learning-Based Scheduling Framework for Time Critical Tasks in Cloud Environments

Container clusters play an increasingly important

role in cloud computing for processing dynamic computing tasks.

The resource manager (i.e., orchestrater) of the cluster automates

the scheduling of the dynamic requests, effectively manages the

resources’ utilization across distributing infrastructure resources.

For many applications, the requests to the cluster are often

with restricted deadlines. The scheduling of container clusters

is often tricky, especially when the cluster’s size is large and the

load of the requests is dynamically changing. Machine learningbased

approaches such as reinforcement learning have attracted

lots of research attention during the past years; However, those

approaches suffer from low robustness when the requests in an

operational environment are changing and different from the

training data sets. This paper investigates this problem by quantifying

the robustness and proposing meta-gradient reinforcement

learning to improve the robustness of classical reinforcement

learning-based approaches. The proposed approach can lead

to better deadline guarantees and faster adaptation for timecritical

task scheduling under dynamic environments. We then

empirically test the benefits of our method using both real-world

and synthetic data sets. The evaluation results show that the

proposed method outperforms the compared RL methods in

scheduling performance and robustness.

Zenodo

Views

849

Downloads

310

Authors

Liu, Hongyun; Chen, Peng; Zhao, Zhiming

DOI

10.1109/CLOUD53861.2021.00082

Publication Date

Tue, 09/07/2021 - 12:00

Zenodo Type Content