Uncategorized

Deploy Model with Custom Docker Image in Azure Machine Learning

AML Docker Images

In my early post, I have showed you how to bring your own custom docker image in training with Azure Machine Learning.
On the contrary, here I’ll show you how to bring custom docker image in model deployment.

In Azure Machine Learning, the base docker image in deployment includes the inferencing assets, such as, nginx, Flask server, etc. So you should use AML compliant image for base image, even when you use your own custom docker image.
The list of these maintained AML images is available in https://github.com/Azure/AzureML-Containers .

Hence, the thing you should do to bring your own custom image in deployment is just to inherit this AML base image.

Note : With AML managed online endpoint, you can deploy and serve model without base AML assets. (You can then host, such as TF serving, in AML.)
See here for details. (This feature is currently in Preview.)

Example

In this example, I’ll setup TensorRT runtime in AML docker image with custom entry script (custom code) to speed up inferencing.

Note : TensorRT is also packaged within the Triton Inference Server container in NGC, and you can use Azure Machine Learning no-code deployment to run Triton Inference Server. (See here.)
In this post, I’ll host TensorRT inferencing with custom entry code by setting up TensorRT manually.

I have generated the following Dockerfile, in which TensorRT 8.0 is installed in the latest AML image for GPU (mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.1-cudnn8-ubuntu18.04).

FROM mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.1-cudnn8-ubuntu18.04

USER root:root

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    libnvinfer8=8.0.1-1+cuda11.3 \
    python3-libnvinfer-dev=8.0.1-1+cuda11.3

RUN pip install nvidia-pyindex

RUN pip install nvidia-tensorrt

Note : In this custom image, NVIDIA package repository has been already added in apt-get package tool. (Since this image is based on nvidia/cuda:11.1.1-cudnn8-devel-ubuntu18.04.)

With docker command, let’s build this Dockerfile and create your own image in local repository.

docker build ./ -t tsmatz/azureml-tensorrt:8.0.1

Register this image in docker repository, such as, Docker Hub or ACR (Azure Container Registry).

Now I have registered this custom image as tsmatz/azureml-tensorrt:8.0.1 in Docker Hub as follows, and everyone can then use this custom image anywhere.

# Login to Docker Hub
# (Or Azure Container Registry)
docker login
# Push your image into your repository
docker push tsmatz/azureml-tensorrt:8.0.1

Run

All preparation has done.
Now you can try to run TensorRT inferencing in Azure Machine Learning. (See my previous post for the script code of TensorRT inferencing.)

Please download and run entire example in GitHub !

import azureml.core
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException
from azureml.core.model import InferenceConfig
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.environment import Environment
from azureml.core.webservice import AksWebservice

#####
# Connect to AML workspace
#####

ws = Workspace(
  workspace_name = "{AML WORKSPACE NAME}",
  subscription_id = "{SUBSCRIPTION ID}",
  resource_group = "{RESOURCE GROUP NAME}")

#####
# Register ONNX model
#####

registered_model = Model.register(
  model_path = './resnetV150_frozen.onnx',
  model_name = 'resnet50-onnx',
  model_framework = Model.Framework.ONNX,
  workspace = ws)

#####
# Create AKS cluster
#     With NVIDIA Tesla T4
#####

prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC4as_T4_v3")
aks_target = ComputeTarget.create(
  workspace=ws,
  name="aks-inference",
  provisioning_configuration=prov_config
)
aks_target.wait_for_completion(show_output=True)

#####
# Create inference config
#   with Custom Image !
#####

# Generate package dependency
conda_dependency = CondaDependencies.create()
conda_dependency.add_conda_package('pycuda')
conda_dependency.add_conda_package('numpy')
conda_dependency.add_pip_package('nvidia-pyindex')
conda_dependency.add_pip_package('nvidia-tensorrt')

# Create environment
env = Environment(name="test-tensorrt-env")
env.python.conda_dependencies = conda_dependency
env.docker.base_image = "tsmatz/azureml-tensorrt:8.0.1"
###env.inferencing_stack_version='latest'

# Create inference config with entry script
inf_conf = InferenceConfig(
  entry_script="inference.py",
  environment=env)

#####
# Create deploy config for AKS
#####

aks_config = AksWebservice.deploy_configuration(
  autoscale_enabled=False,
  num_replicas=3,
  cpu_cores=2,
  memory_gb=10)

#####
# Deploy model
# (Put them together !)
#####

svc = Model.deploy(
  name='my-tensorrt-inference',
  deployment_config=aks_config,
  deployment_target=aks_target,
  models=[registered_model],
  inference_config=inf_conf,
  workspace=ws)
svc.wait_for_deployment(show_output=True)

Note : Without base image settings, you can directly set Dockerfile text in AML configuration as below.

env.docker.base_image = None
env.docker.base_dockerfile = """
FROM mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.1-cudnn8-ubuntu18.04
USER root:root
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    libnvinfer8=8.0.1-1+cuda11.3 \
    python3-libnvinfer-dev=8.0.1-1+cuda11.3
RUN pip install nvidia-pyindex
RUN pip install nvidia-tensorrt
"""

Categories: Uncategorized

Tagged as: ,

2 replies »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s