Run Reinforcement Learning on Ray Cluster (Microsoft Azure)

(Please download here for the sample code in this post.)

In reinforcement learning (RL), the training workloads would sometimes be heavy and you should then run training on multiple machines (cluster) to speed up.
Fortunately, you can use Ray framework for distributed RL training on multiple machines. However you should setup and configure various components (such as, autonomous simulators) to run training in practical RL.

This walkthrough will show you end-to-end configuration for Ray cluster in Microsoft Azure, using artificial intelligence agents in Minecraft, Project Malmo.
If you’re not familiar with Project Malmo, please see my early post “Enjoy AI in Minecraft” for tutorial.

All source code in this post resides in GitHub repository, and you can soon run this example. I note that this example requires much computer resources and then use powerful machines. (Here I’ve used Standard D3 v2 (4 vcpus, 14 GB memory) VM on Microsoft Azure for getting started, but it’s better to run on GPU.)

Note (Other options to run scaled RL) : You can also use ReinforcementLearningEstimator (Preview) in Azure Machine Learning, however it depends on Ray version 0.8 (old version).
For autonomous training (simulations), you can also use Project Bonsai (Preview) in Microsoft Azure.

Run Application (Minecraft) with Headless

In reinforcement learning, the agent will often run on graphical interfaces, such as, simulator UI or game console.
In this example, the training will also run on your familiar Minecraft UI. (See below.)

# Running Minecraft client for RL (Project Malmo)
./launchClient.sh -port 9000

When you run training on the cluster, the workload will be submitted in multiple machines and you cannot then use and configure real monitor for the agent.
In this example, Minecraft UI should also be redirected to virtual monitor to run distributed training. To achieve this, xvfb (X Virtual Frame Buffer) can be used for launching headless Minecraft.

# Running Minecraft client for RL using Virtual Monitor
xvfb-run -a -e /dev/stdout -s '-screen 0 640x480x16' ./launchClient.sh -port 9000

To run headless Minecraft in all workers, I have created my custom Gym environment (MalmoMazeEnv), in which headless Minecraft will automatically start when the environment (env) is initialized. (See here for the entire source code of this class.)
The agent can then run on hidden virtual monitor in every machines.


class MalmoMazeEnv(gym.Env):

  def __init__(self):
    # Set up gym.Env
    super(MalmoMazeEnv, self).__init__()

    # Launch headless Minecraft
    # The following command is written in {package folder}/shell_files/launchClient_headless.sh
    # xvfb-run -a -e /dev/stdout -s '-screen 0 640x480x16' ./launchClient.sh -port $1
    self.malmo_port = 9000
    launch_shell_file = str(pathlib.Path(__file__).parent.parent.absolute()) + "/shell_files/launchClient_headless.sh"
    dev_null = open(os.devnull, "w")
    self.proc = subprocess.Popen(
      ["bash", launch_shell_file, str(self.malmo_port)],

In the practical RL, do several pre-training settings to enable the agent to be automatically started.

Run Training on Ray Cluster (Manually Configured)

Now let’s manually configure Ray cluster (multiple machines), and run training script on this cluster.
For understanding what happens in Ray cluster on the bottom, it will be important to configure the cluster manually by your own.

Ray cluster basically consists of one head node and multiple worker nodes. All Ray nodes (both head and worker) should then be connected (by default port 6379) each other on the same network.
When you run in Microsoft Azure, the most easiest way is to create on the same resource group. Then all VMs can be connected each other on the same virtual network (VNet) with internal addresses by default.

The program code (with Ray tune class) should be started on the head node, and the training workloads will then be submitted on the running other workers by Ray framework.

On the head node, you should install Ray and prerequisites libraries (see here for settings). After installation, run the following Ray CLI command on this head node.

ray start --head --port=6379

On every workers, you should also install Ray and prerequisites libraries. After installation, run the following command to connect to the previous head node.
In the following script, is the address of Ray head. (Both --address and --redis-password will be shown in the console on the previous head node.)

ray start --address='' --redis-password='5241590000000000'

Note : You can stop Ray runtime by running “ray stop” command in each nodes.

In order to run Minecraft RL training, the required components – including Ray, Malmo platform (with Minecraft), xvfb, and custom Gym env – should all be installed in Ray nodes (head and workers) on cluster.

Now Ray cluster is ready.
By running the following code in head node, the program will connect to the running Ray cluster by ray.init(address='auto') function, and the training begins by ray.tune.run().
To scale training workloads, I have used high-throughput architecture algorithm, called IMPALA (see here).


import os
import argparse
import ray
import ray.tune as tune

# Function for stopping a learner when successful training
def stop_check(trial_id, result):
  return result["episode_reward_mean"] >= 85

# Main
if __name__ == '__main__':
  parser = argparse.ArgumentParser()
    help="number of ray workers")
    help="number of gpus")
    help="number of cores per worker")
  args = parser.parse_args()


      "log_level": "WARN",
      "env": "custom_malmo_env:MalmoMazeEnv-v0",
      "num_workers": args.num_workers,
      "num_gpus": args.num_gpus,
      "num_cpus_per_worker": args.num_cpus_per_worker,
      "explore": True,
      "exploration_config": {
        "type": "EpsilonGreedy",
        "initial_epsilon": 1.0,
        "final_epsilon": 0.02,
        "epsilon_timesteps": 500000

When the training has started, you can see the output (statistics in each training iterations) in the console and the results (checkpoint files with trained parameters) will be logged in ./logs folder. (See local_dir in above source code.)
You can restore checkpoint and run a trained agent anywhere.

You can also monitor cluster resources with Ray dashboard UI. (See below.)
In the following example, 3 workers (2 training workers and 1 IMPALA coordinator) are running in 1 head node and 1 worker node.

Note : Ray dashboard is hosted as in head node. You can connect to Ray dashboard with SSH tunnel (port forwarding).

To run this example on Azure Machine Learning (AML), see here.
(With AML, the computing instances can automatically be scaled down to 0 instances when the training has completed.)

Use Ray Cloud Provider for Azure

By using Ray autoscaler, you can remotely control (create / connect / down) the cluster. You don’t need manually create and configure each workers.

Note : This cluster can automatically scale depending on the workload.

To use cloud provider on Ray, first you should write YAML configuration.
As you can see below, the following configuration (azure_ray_config.yaml) uses Azure provider. (See “type: azure” in the following configuration.) It will deploy Ray cluster as Azure resources using Azure SDK for Python.

In the following configuration, I configured to use my custom docker image (tsmatz/malmo-maze:0.36.0) for containers (both driver and workers) in cluster. In this custom docker image, the required components – Python, Ray, Malmo (with Minecraft), xvfb, and custom Gym env – are all installed and configured. (See here for Dockerfile.)
I have also configured data science virtual machine (DSVM) in Azure for underlying virtual machine images, in which docker runtime is already configured.

See here for details about YAML configuration reference.


cluster_name: default

max_workers: 2

upscaling_speed: 1.0

  image: "tsmatz/malmo-maze:0.36.0"
  container_name: "ray_container"
  pull_before_run: True
    - --ulimit nofile=65536:65536

idle_timeout_minutes: 5

  type: azure
  location: eastus
  resource_group: ray-cluster-test01

  ssh_user: ubuntu
  ssh_private_key: ~/.ssh/id_rsa
  # should match what is specified in "file_mounts" section
  ssh_public_key: ~/.ssh/id_rsa.pub

    resources: {"CPU": 4}
        vmSize: Standard_D3_v2
        imagePublisher: microsoft-dsvm
        imageOffer: ubuntu-1804
        imageSku: "1804"
        imageVersion: latest

    min_workers: 0
    max_workers: 2
    resources: {"CPU": 4}
        vmSize: Standard_D3_v2
        imagePublisher: microsoft-dsvm
        imageOffer: ubuntu-1804
        imageSku: "1804"
        imageVersion: latest

head_node_type: ray.head.default

file_mounts: {
   "~/.ssh/id_rsa.pub": "~/.ssh/id_rsa.pub"

cluster_synced_files: []

file_mounts_sync_continuously: False

  - "**/.git"
  - "**/.git/**"

  - ".gitignore"

  - sudo usermod -aG docker $USER || true
  - sleep 10
  - touch ~/.sudo_as_admin_successful

setup_commands: []

head_setup_commands: []

worker_setup_commands: []

  - ray stop
  - ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml

  - ray stop
  - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

head_node: {}
worker_nodes: {}

Note : You can also use pre-built Ray docker image (see here) and set custom commands (settings) in YAML.

Now let’s create the cluster (all nodes) with Ray CLI command, “ray up“.
By running this command, all Azure resources in this cluster are generated in a single Azure resource group.

Before running this Ray command, you should login to Azure with Azure CLI as follows.

# Generate ssh key pair, which is used to authenticate Ray nodes.
ssh-keygen -t rsa -b 4096

# Login to Azure
az login
az account set -s {your_subscription_id}

# Create Ray cluster (all nodes) in Azure
ray up ./azure_ray_config.yaml

To use this running cluster for training, you don’t need to login to head node directly.
From your working client, you can submit the source code (training code) to the driver (container) on head node with previous YAML configuration file (azure_ray_config.yaml).
(For the following train_cluster.py, see above code.)

ray submit ./azure_ray_config.yaml train_cluster.py

You can also directly connect to driver (container) on head node from your client, and run commands manually as follows.
The computation (Ray remote block) will be automatically scaled depending on the resource consumption.

ray attach ./azure_ray_config.yaml

$ python3 -c 'import ray; ray.init(address="auto")'
$ exit

In order to see Ray dashboard, run the following command and forward remote dashboard on local. (You can then see with .)

ray dashboard ./azure_ray_config.yaml

When you have finished training, you can tear down the cluster as follows.
(All Ray nodes (virtual machines) will then be deleted.)

ray down ./azure_ray_config.yaml

Use ARM template

You can also deploy auto-scaled Ray cluster using ARM template. (Let’s click here to deploy this template in your Azure subscription.)
In this template, the learning will run directly on virtual machine (not on docker container).

In order to customize settings on virtual machine, you can add your custom python package (such as, Gym environment) with PythonPackages parameter, or edit azure-init.sh (see here) in this template.
All settings are defined in Azure nature, and you can fully customize the template as you like.

The machine will be scaled up and down based on CPU usage by Azure virtual machine scale sets (VMSS) auto-scale settings.
See here (template settings) for detailed definitions in this template.


Reference :

GitHub : Reinforcement Learning Tutorials with Python

GitHub : Minecraft Reinforcement Learning on Ray Cluster (Distributed RL)

Project Malmo Tutorial (Post “Enjoy AI in Minecraft”)


Categories: Uncategorized

Tagged as: ,

1 reply »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s