Model Registry Service with Wallaroo SDK Demonstration

How to use the Wallaroo SDK with Model Registry Services

Table of Contents

This tutorial can be downloaded as part of the Wallaroo Tutorials repository.

MLFLow Registry Model Upload Demonstration

Wallaroo users can register their trained machine learning models from a model registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.

This guide details how to add ML Models from a model registry service into a Wallaroo instance.

Artifact Requirements

Models are uploaded to the Wallaroo instance as the specific artifact - the “file” or other data that represents the file itself. This must comply with the Wallaroo model requirements framework and version or it will not be deployed.

This tutorial will:

  • Create a Wallaroo workspace and pipeline.
  • Show how to connect a Wallaroo Registry that connects to a Model Registry Service.
  • Use the registry connection details to upload a sample model to Wallaroo.
  • Perform a sample inference.

Prerequisites

  • A Wallaroo version 2023.2.1 or above instance.
  • A Model (aka Artifact) Registry Service

References

Tutorial Steps

Import Libraries

We’ll start with importing the libraries we need for the tutorial.

import os
import wallaroo

Connect to the Wallaroo Instance through the User Interface

The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.

wl=wallaroo.Client()

wl = wallaroo.Client()

wallarooPrefix = "doc-test."
wallarooSuffix = "wallarooexample.ai"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}keycloak.{wallarooSuffix}", 
                    auth_type="sso")
Please log into the following URL in a web browser:
https://doc-test.keycloak.wallarooexample.ai/auth/realms/master/device?user_code=YOSQ-MZCY

Login successful!

Connect to Model Registry

The Wallaroo Registry stores the URL and authentication token to the Model Registry service, with the assigned name. Note that in this demonstration all URLs and token are examples.

registry = wl.create_model_registry(name="JeffRegistry45", 
                                    token="dapi67c8c0b04606f730e78b7ae5e3221015-3", 
                                    url="https://sample.registry.service.azuredatabricks.net")
registry
FieldValue
NameJeffRegistry45
URLhttps://sample.registry.service.azuredatabricks.net
Workspacesjohn.hummel@wallaroo.ai - Default Workspace
Created At2023-17-Jul 19:54:49
Updated At2023-17-Jul 19:54:49

List Model Registries

Registries associated with a workspace are listed with the Wallaroo.Client.list_model_registries() method.

# List all registries in this workspace
registries = wl.list_model_registries()
registries
nameregistry urlcreated atupdated at
JeffRegistry45https://sample.registry.service.azuredatabricks.net2023-17-Jul 17:56:522023-17-Jul 17:56:52
JeffRegistry45https://sample.registry.service.azuredatabricks.net2023-17-Jul 19:54:492023-17-Jul 19:54:49

Create Workspace

For this demonstration, we will create a random Wallaroo workspace, then attach our registry to the workspace so it is accessible by other workspace users.

Add Registry to Workspace

Registries are assigned to a Wallaroo workspace with the Wallaroo.registry.add_registry_to_workspace method. This allows members of the workspace to access the registry connection. A registry can be associated with one or more workspaces.

Add Registry to Workspace Parameters

ParameterTypeDescription
namestring (Required)The numerical identifier of the workspace.
# Make a random new workspace
import math
import random
num = math.floor(random.random()* 1000)
workspace_id = wl.create_workspace(f"test{num}").id()

registry.add_registry_to_workspace(workspace_id=workspace_id)
FieldValue
NameJeffRegistry45
URLhttps://sample.registry.service.azuredatabricks.net
Workspacestest68, john.hummel@wallaroo.ai - Default Workspace
Created At2023-17-Jul 19:54:49
Updated At2023-17-Jul 19:54:49

Remove Registry from Workspace

Registries are removed from a Wallaroo workspace with the Registry remove_registry_from_workspace method.

Remove Registry from Workspace Parameters

ParameterTypeDescription
workspace_idInteger (Required)The numerical identifier of the workspace.
registry.remove_registry_from_workspace(workspace_id=workspace_id)
FieldValue
NameJeffRegistry45
URLhttps://sample.registry.service.azuredatabricks.net
Workspacesjohn.hummel@wallaroo.ai - Default Workspace
Created At2023-17-Jul 19:54:49
Updated At2023-17-Jul 19:54:49

List Models in a Registry

A List of models available to the Wallaroo instance through the MLFlow Registry is performed with the Wallaroo.Registry.list_models() method.

registry_models = registry.list_models()
registry_models
  <tr>
    <td>logreg1</td>
    <td>gib.bhojraj@wallaroo.ai</td>
    <td>1</td>
    <td>2023-06-Jul 14:36:54</td>
    <td>2023-06-Jul 14:36:56</td>
  </tr>

<tr>
<td>sidekick-test</td>
<td>gib.bhojraj@wallaroo.ai</td>
<td>1</td>
<td>2023-11-Jul 14:42:14</td>
<td>2023-11-Jul 14:42:14</td>
</tr>

<tr>
<td>testmodel</td>
<td>gib.bhojraj@wallaroo.ai</td>
<td>1</td>
<td>2023-16-Jun 12:38:42</td>
<td>2023-06-Jul 15:03:41</td>
</tr>

<tr>
<td>testmodel2</td>
<td>gib.bhojraj@wallaroo.ai</td>
<td>1</td>
<td>2023-16-Jun 12:41:04</td>
<td>2023-29-Jun 18:08:33</td>
</tr>

<tr>
<td>verified-working</td>
<td>gib.bhojraj@wallaroo.ai</td>
<td>1</td>
<td>2023-11-Jul 16:18:03</td>
<td>2023-11-Jul 16:57:54</td>
</tr>

<tr>
<td>wine_quality</td>
<td>gib.bhojraj@wallaroo.ai</td>
<td>2</td>
<td>2023-16-Jun 13:05:53</td>
<td>2023-16-Jun 13:09:57</td>
</tr>

NameRegistry UserVersionsCreated AtUpdated At

Select Model from Registry

Registry models are selected from the Wallaroo.Registry.list_models() method, then specifying the model to use.

single_registry_model = registry_models[4]
single_registry_model
Nameverified-working
Registry Usergib.bhojraj@wallaroo.ai
Versions1
Created At2023-11-Jul 16:18:03
Updated At2023-11-Jul 16:57:54

List Model Versions

The Registry Model attribute versions shows the complete list of versions for the particular model.

single_registry_model.versions()
NameVersionDescription
verified-working3None

List Model Version Artifacts

Artifacts belonging to a MLFlow registry model are listed with the Model Version list_artifacts() method. This returns all artifacts for the model.

single_registry_model.versions()[1].list_artifacts()
File NameFile SizeFull Path
MLmodel559Bhttps://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/MLmodel
conda.yaml182Bhttps://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/conda.yaml
model.pkl829Bhttps://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/model.pkl
python_env.yaml122Bhttps://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/python_env.yaml
requirements.txt73Bhttps://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/requirements.txt

Configure Data Schemas

To upload a ML Model to Wallaroo, the input and output schemas must be defined in pyarrow.lib.Schema format.

from wallaroo.framework import Framework
import pyarrow as pa

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])

output_schema = pa.schema([
    pa.field('predictions', pa.int32()),
    pa.field('probabilities', pa.list_(pa.float64(), list_size=3))
])

Upload a Model from a Registry

Models uploaded to the Wallaroo workspace are uploaded from a MLFlow Registry with the Wallaroo.Registry.upload method.

Upload a Model from a Registry Parameters

ParameterTypeDescription
namestring (Required)The name to assign the model once uploaded. Model names are unique within a workspace. Models assigned the same name as an existing model will be uploaded as a new model version.
pathstring (Required)The full path to the model artifact in the registry.
frameworkstring (Required)The Wallaroo model Framework. See Model Uploads and Registrations Supported Frameworks
input_schemapyarrow.lib.Schema (Required for non-native runtimes)The input schema in Apache Arrow schema format.
output_schemapyarrow.lib.Schema (Required for non-native runtimes)The output schema in Apache Arrow schema format.
model = registry.upload_model(
  name="verified-working", 
  path="https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/model.pkl", 
  framework=Framework.SKLEARN,
  input_schema=input_schema,
  output_schema=output_schema)
model
Nameverified-working
Versioncf194b65-65b2-4d42-a4e2-6ca6fa5bfc42
File Namemodel.pkl
SHA5f4c25b0b564ab9fe0ea437424323501a460aa74463e81645a6419be67933ca4
Statuspending_conversion
Image PathNone
Updated At2023-17-Jul 17:57:23

Verify the Model Status

Once uploaded, the model will undergo conversion. The following will loop through the model status until it is ready. Once ready, it is available for deployment.

import time
while model.status() != "ready" and model.status() != "error":
    print(model.status())
    time.sleep(3)
print(model.status())
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
ready

Model Runtime

Once uploaded and converted, the model runtime is derived. This determines whether to allocate resources to pipeline’s native runtime environment or containerized runtime environment. For more details, see the Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration guide.

model.config().runtime()
'mlflow'

Deploy Pipeline

The model is uploaded and ready for use. We’ll add it as a step in our pipeline, then deploy the pipeline. For this example we’re allocated 0.5 cpu to the runtime environment and 1 CPU to the containerized runtime environment.

import os, json
from wallaroo.deployment_config import DeploymentConfigBuilder
deployment_config = DeploymentConfigBuilder().cpus(0.5).sidekick_cpus(model, 1).build()
pipeline = wl.build_pipeline("jefftest1")
pipeline = pipeline.add_model_step(model)
deployment = pipeline.deploy(deployment_config=deployment_config)
pipeline.status()
{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.3.148',
   'name': 'engine-86c7fc5c95-8kwh5',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'jefftest1',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'verified-working',
      'version': 'cf194b65-65b2-4d42-a4e2-6ca6fa5bfc42',
      'sha': '5f4c25b0b564ab9fe0ea437424323501a460aa74463e81645a6419be67933ca4',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.4.203',
   'name': 'engine-lb-584f54c899-tpv5b',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': [{'ip': '10.244.0.225',
   'name': 'engine-sidekick-verified-working-43-74f957566d-9zdfh',
   'status': 'Running',
   'reason': None,
   'details': [],
   'statuses': '\n'}]}

Run Inference

A sample inference will be run. First the pandas DataFrame used for the inference is created, then the inference run through the pipeline’s infer method.

import pandas as pd
from sklearn.datasets import load_iris

data = load_iris(as_frame=True)

X = data['data'].values
dataframe = pd.DataFrame({"inputs": data['data'][:2].values.tolist()})
dataframe
inputs
0[5.1, 3.5, 1.4, 0.2]
1[4.9, 3.0, 1.4, 0.2]
deployment.infer(dataframe)
timein.inputsout.predictionsout.probabilitiescheck_failures
02023-07-17 17:59:18.840[5.1, 3.5, 1.4, 0.2]0[0.981814913291491, 0.018185072312411506, 1.43...0
12023-07-17 17:59:18.840[4.9, 3.0, 1.4, 0.2]0[0.9717552971628304, 0.02824467272952288, 3.01...0

Undeploy Pipelines

With the tutorial complete, the pipeline is undeployed to return the resources back to the cluster.

pipeline.undeploy()
namejefftest1
created2023-07-17 17:59:05.922172+00:00
last_updated2023-07-17 17:59:06.684060+00:00
deployedFalse
tags
versionsc2cca319-fcad-47b2-9de0-ad5b2852d1a2, f1e6d1b5-96ee-46a1-bfdf-174310ff4270
stepsverified-working