Model Registry Service with Wallaroo SDK Demonstration

How to use the Wallaroo SDK with Model Registry Services

This tutorial can be downloaded as part of the Wallaroo Tutorials repository.

MLFLow Registry Model Upload Demonstration

Wallaroo users can register their trained machine learning models from a model registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.

This guide details how to add ML Models from a model registry service into a Wallaroo instance.

Artifact Requirements

Models are uploaded to the Wallaroo instance as the specific artifact - the “file” or other data that represents the file itself. This must comply with the Wallaroo model requirements framework and version or it will not be deployed.

This tutorial will:

Create a Wallaroo workspace and pipeline.
Show how to connect a Wallaroo Registry that connects to a Model Registry Service.
Use the registry connection details to upload a sample model to Wallaroo.
Perform a sample inference.

Prerequisites

A Wallaroo version 2023.2.1 or above instance.
A Model (aka Artifact) Registry Service

References

Tutorial Steps

Import Libraries

We’ll start with importing the libraries we need for the tutorial.

import os
import wallaroo

Connect to the Wallaroo Instance through the User Interface

The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.

wl=wallaroo.Client()

Connect to Model Registry

The Wallaroo Registry stores the URL and authentication token to the Model Registry service, with the assigned name. Note that in this demonstration all URLs and token are examples.

registry = wl.create_model_registry(name="JeffRegistry45", 
                                    token="dapi67c8c0b04606f730e78b7ae5e3221015-3", 
                                    url="https://sample.registry.service.azuredatabricks.net")
registry

Field	Value
Name	JeffRegistry45
URL	https://sample.registry.service.azuredatabricks.net
Workspaces	john.hummel@wallaroo.ai - Default Workspace
Created At	2023-17-Jul 19:54:49
Updated At	2023-17-Jul 19:54:49

List Model Registries

Registries associated with a workspace are listed with the Wallaroo.Client.list_model_registries() method.

# List all registries in this workspace
registries = wl.list_model_registries()
registries

name	registry url	created at	updated at
JeffRegistry45	https://sample.registry.service.azuredatabricks.net	2023-17-Jul 17:56:52	2023-17-Jul 17:56:52
JeffRegistry45	https://sample.registry.service.azuredatabricks.net	2023-17-Jul 19:54:49	2023-17-Jul 19:54:49

Create Workspace

For this demonstration, we will create a random Wallaroo workspace, then attach our registry to the workspace so it is accessible by other workspace users.

Add Registry to Workspace

Registries are assigned to a Wallaroo workspace with the Wallaroo.registry.add_registry_to_workspace method. This allows members of the workspace to access the registry connection. A registry can be associated with one or more workspaces.

Add Registry to Workspace Parameters

Parameter	Type	Description
`name`	string (Required)	The numerical identifier of the workspace.

# Make a random new workspace
import math
import random
num = math.floor(random.random()* 1000)
workspace_id = wl.create_workspace(f"test{num}").id()

registry.add_registry_to_workspace(workspace_id=workspace_id)

Field	Value
Name	JeffRegistry45
URL	https://sample.registry.service.azuredatabricks.net
Workspaces	test68, john.hummel@wallaroo.ai - Default Workspace
Created At	2023-17-Jul 19:54:49
Updated At	2023-17-Jul 19:54:49

Remove Registry from Workspace

Registries are removed from a Wallaroo workspace with the Registry remove_registry_from_workspace method.

Remove Registry from Workspace Parameters

Parameter	Type	Description
`workspace_id`	Integer (Required)	The numerical identifier of the workspace.

registry.remove_registry_from_workspace(workspace_id=workspace_id)

Field	Value
Name	JeffRegistry45
URL	https://sample.registry.service.azuredatabricks.net
Workspaces	john.hummel@wallaroo.ai - Default Workspace
Created At	2023-17-Jul 19:54:49
Updated At	2023-17-Jul 19:54:49

List Models in a Registry

A List of models available to the Wallaroo instance through the MLFlow Registry is performed with the Wallaroo.Registry.list_models() method.

registry_models = registry.list_models()
registry_models

Name	Registry User	Versions	Created At	Updated At
logreg1	gib.bhojraj@wallaroo.ai	1	2023-06-Jul 14:36:54	2023-06-Jul 14:36:56
sidekick-test	gib.bhojraj@wallaroo.ai	1	2023-11-Jul 14:42:14	2023-11-Jul 14:42:14
testmodel	gib.bhojraj@wallaroo.ai	1	2023-16-Jun 12:38:42	2023-06-Jul 15:03:41
testmodel2	gib.bhojraj@wallaroo.ai	1	2023-16-Jun 12:41:04	2023-29-Jun 18:08:33
verified-working	gib.bhojraj@wallaroo.ai	1	2023-11-Jul 16:18:03	2023-11-Jul 16:57:54
wine_quality	gib.bhojraj@wallaroo.ai	2	2023-16-Jun 13:05:53	2023-16-Jun 13:09:57

Select Model from Registry

Registry models are selected from the Wallaroo.Registry.list_models() method, then specifying the model to use.

single_registry_model = registry_models[4]
single_registry_model

Name	verified-working
Registry User	gib.bhojraj@wallaroo.ai
Versions	1
Created At	2023-11-Jul 16:18:03
Updated At	2023-11-Jul 16:57:54

List Model Versions

The Registry Model attribute versions shows the complete list of versions for the particular model.

single_registry_model.versions()

Name	Version	Description
verified-working	3	None

List Model Version Artifacts

Artifacts belonging to a MLFlow registry model are listed with the Model Version list_artifacts() method. This returns all artifacts for the model.

single_registry_model.versions()[1].list_artifacts()

File Name	File Size	Full Path
MLmodel	559B	https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/MLmodel
conda.yaml	182B	https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/conda.yaml
model.pkl	829B	https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/model.pkl
python_env.yaml	122B	https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/python_env.yaml
requirements.txt	73B	https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/requirements.txt

Configure Data Schemas

To upload a ML Model to Wallaroo, the input and output schemas must be defined in pyarrow.lib.Schema format.

from wallaroo.framework import Framework
import pyarrow as pa

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])

output_schema = pa.schema([
    pa.field('predictions', pa.int32()),
    pa.field('probabilities', pa.list_(pa.float64(), list_size=3))
])

Upload a Model from a Registry

Models uploaded to the Wallaroo workspace are uploaded from a MLFlow Registry with the Wallaroo.Registry.upload method.

Upload a Model from a Registry Parameters

Parameter	Type	Description
`name`	string (Required)	The name to assign the model once uploaded. Model names are unique within a workspace. Models assigned the same name as an existing model will be uploaded as a new model version.
`path`	string (Required)	The full path to the model artifact in the registry.
`framework`	string (Required)	The Wallaroo model `Framework`. See Model Uploads and Registrations Supported Frameworks
`input_schema`	`pyarrow.lib.Schema` (Required for non-native runtimes)	The input schema in Apache Arrow schema format.
`output_schema`	`pyarrow.lib.Schema` (Required for non-native runtimes)	The output schema in Apache Arrow schema format.

model = registry.upload_model(
  name="verified-working", 
  path="https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/model.pkl", 
  framework=Framework.SKLEARN,
  input_schema=input_schema,
  output_schema=output_schema)
model

Name	verified-working
Version	cf194b65-65b2-4d42-a4e2-6ca6fa5bfc42
File Name	model.pkl
SHA	5f4c25b0b564ab9fe0ea437424323501a460aa74463e81645a6419be67933ca4
Status	pending_conversion
Image Path	None
Updated At	2023-17-Jul 17:57:23

Verify the Model Status

Once uploaded, the model will undergo conversion. The following will loop through the model status until it is ready. Once ready, it is available for deployment.

import time
while model.status() != "ready" and model.status() != "error":
    print(model.status())
    time.sleep(3)
print(model.status())

pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
ready

Model Runtime

Once uploaded and converted, the model runtime is derived. This determines whether to allocate resources to pipeline’s native runtime environment or containerized runtime environment. For more details, see the Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration guide.

model.config().runtime()

'mlflow'

Deploy Pipeline

The model is uploaded and ready for use. We’ll add it as a step in our pipeline, then deploy the pipeline. For this example we’re allocated 0.5 cpu to the runtime environment and 1 CPU to the containerized runtime environment.

import os, json
from wallaroo.deployment_config import DeploymentConfigBuilder
deployment_config = DeploymentConfigBuilder().cpus(0.5).sidekick_cpus(model, 1).build()
pipeline = wl.build_pipeline("jefftest1")
pipeline = pipeline.add_model_step(model)
deployment = pipeline.deploy(deployment_config=deployment_config)

pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.3.148',
   'name': 'engine-86c7fc5c95-8kwh5',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'jefftest1',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'verified-working',
      'version': 'cf194b65-65b2-4d42-a4e2-6ca6fa5bfc42',
      'sha': '5f4c25b0b564ab9fe0ea437424323501a460aa74463e81645a6419be67933ca4',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.4.203',
   'name': 'engine-lb-584f54c899-tpv5b',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': [{'ip': '10.244.0.225',
   'name': 'engine-sidekick-verified-working-43-74f957566d-9zdfh',
   'status': 'Running',
   'reason': None,
   'details': [],
   'statuses': '\n'}]}

Run Inference

A sample inference will be run. First the pandas DataFrame used for the inference is created, then the inference run through the pipeline’s infer method.

import pandas as pd
from sklearn.datasets import load_iris

data = load_iris(as_frame=True)

X = data['data'].values
dataframe = pd.DataFrame({"inputs": data['data'][:2].values.tolist()})
dataframe

	inputs
0	[5.1, 3.5, 1.4, 0.2]
1	[4.9, 3.0, 1.4, 0.2]

deployment.infer(dataframe)

	time	in.inputs	out.predictions	out.probabilities	check_failures
0	2023-07-17 17:59:18.840	[5.1, 3.5, 1.4, 0.2]	0	[0.981814913291491, 0.018185072312411506, 1.43...	0
1	2023-07-17 17:59:18.840	[4.9, 3.0, 1.4, 0.2]	0	[0.9717552971628304, 0.02824467272952288, 3.01...	0

Undeploy Pipelines

With the tutorial complete, the pipeline is undeployed to return the resources back to the cluster.

pipeline.undeploy()

name	jefftest1
created	2023-07-17 17:59:05.922172+00:00
last_updated	2023-07-17 17:59:06.684060+00:00
deployed	False
tags
versions	c2cca319-fcad-47b2-9de0-ad5b2852d1a2, f1e6d1b5-96ee-46a1-bfdf-174310ff4270
steps	verified-working