This tutorial can be downloaded as part of the Wallaroo Tutorials repository.
MLFLow Registry Model Upload Demonstration
Wallaroo users can register their trained machine learning models from a model registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.
This guide details how to add ML Models from a model registry service into a Wallaroo instance.
Artifact Requirements
Models are uploaded to the Wallaroo instance as the specific artifact - the “file” or other data that represents the file itself. This must comply with the Wallaroo model requirements framework and version or it will not be deployed.
This tutorial will:
- Create a Wallaroo workspace and pipeline.
- Show how to connect a Wallaroo Registry that connects to a Model Registry Service.
- Use the registry connection details to upload a sample model to Wallaroo.
- Perform a sample inference.
Prerequisites
- A Wallaroo version 2023.2.1 or above instance.
- A Model (aka Artifact) Registry Service
References
Tutorial Steps
Import Libraries
We’ll start with importing the libraries we need for the tutorial.
import os
import wallaroo
Connect to the Wallaroo Instance through the User Interface
The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. For more information on Wallaroo Client settings, see the Client Connection guide.
wl=wallaroo.Client()
wl = wallaroo.Client()
wallarooPrefix = "doc-test."
wallarooSuffix = "wallarooexample.ai"
wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}api.{wallarooSuffix}",
auth_endpoint=f"https://{wallarooPrefix}keycloak.{wallarooSuffix}",
auth_type="sso")
Please log into the following URL in a web browser:
https://doc-test.keycloak.wallarooexample.ai/auth/realms/master/device?user_code=YOSQ-MZCY
Login successful!
Connect to Model Registry
The Wallaroo Registry stores the URL and authentication token to the Model Registry service, with the assigned name. Note that in this demonstration all URLs and token are examples.
registry = wl.create_model_registry(name="JeffRegistry45",
token="dapi67c8c0b04606f730e78b7ae5e3221015-3",
url="https://sample.registry.service.azuredatabricks.net")
registry
Field | Value |
---|---|
Name | JeffRegistry45 |
URL | https://sample.registry.service.azuredatabricks.net |
Workspaces | john.hummel@wallaroo.ai - Default Workspace |
Created At | 2023-17-Jul 19:54:49 |
Updated At | 2023-17-Jul 19:54:49 |
List Model Registries
Registries associated with a workspace are listed with the Wallaroo.Client.list_model_registries()
method.
# List all registries in this workspace
registries = wl.list_model_registries()
registries
name | registry url | created at | updated at |
---|---|---|---|
JeffRegistry45 | https://sample.registry.service.azuredatabricks.net | 2023-17-Jul 17:56:52 | 2023-17-Jul 17:56:52 |
JeffRegistry45 | https://sample.registry.service.azuredatabricks.net | 2023-17-Jul 19:54:49 | 2023-17-Jul 19:54:49 |
Create Workspace
For this demonstration, we will create a random Wallaroo workspace, then attach our registry to the workspace so it is accessible by other workspace users.
Add Registry to Workspace
Registries are assigned to a Wallaroo workspace with the Wallaroo.registry.add_registry_to_workspace
method. This allows members of the workspace to access the registry connection. A registry can be associated with one or more workspaces.
Add Registry to Workspace Parameters
Parameter | Type | Description |
---|---|---|
name | string (Required) | The numerical identifier of the workspace. |
# Make a random new workspace
import math
import random
num = math.floor(random.random()* 1000)
workspace_id = wl.create_workspace(f"test{num}").id()
registry.add_registry_to_workspace(workspace_id=workspace_id)
Field | Value |
---|---|
Name | JeffRegistry45 |
URL | https://sample.registry.service.azuredatabricks.net |
Workspaces | test68, john.hummel@wallaroo.ai - Default Workspace |
Created At | 2023-17-Jul 19:54:49 |
Updated At | 2023-17-Jul 19:54:49 |
Remove Registry from Workspace
Registries are removed from a Wallaroo workspace with the Registry remove_registry_from_workspace
method.
Remove Registry from Workspace Parameters
Parameter | Type | Description |
---|---|---|
workspace_id | Integer (Required) | The numerical identifier of the workspace. |
registry.remove_registry_from_workspace(workspace_id=workspace_id)
Field | Value |
---|---|
Name | JeffRegistry45 |
URL | https://sample.registry.service.azuredatabricks.net |
Workspaces | john.hummel@wallaroo.ai - Default Workspace |
Created At | 2023-17-Jul 19:54:49 |
Updated At | 2023-17-Jul 19:54:49 |
List Models in a Registry
A List of models available to the Wallaroo instance through the MLFlow Registry is performed with the Wallaroo.Registry.list_models()
method.
registry_models = registry.list_models()
registry_models
Name | Registry User | Versions | Created At | Updated At |
Select Model from Registry
Registry models are selected from the Wallaroo.Registry.list_models()
method, then specifying the model to use.
single_registry_model = registry_models[4]
single_registry_model
Name | verified-working |
Registry User | gib.bhojraj@wallaroo.ai |
Versions | 1 |
Created At | 2023-11-Jul 16:18:03 |
Updated At | 2023-11-Jul 16:57:54 |
List Model Versions
The Registry Model attribute versions
shows the complete list of versions for the particular model.
single_registry_model.versions()
Name | Version | Description |
verified-working | 3 | None |
List Model Version Artifacts
Artifacts belonging to a MLFlow registry model are listed with the Model Version list_artifacts()
method. This returns all artifacts for the model.
single_registry_model.versions()[1].list_artifacts()
File Name | File Size | Full Path |
---|---|---|
MLmodel | 559B | https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/MLmodel |
conda.yaml | 182B | https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/conda.yaml |
model.pkl | 829B | https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/model.pkl |
python_env.yaml | 122B | https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/python_env.yaml |
requirements.txt | 73B | https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/requirements.txt |
Configure Data Schemas
To upload a ML Model to Wallaroo, the input and output schemas must be defined in pyarrow.lib.Schema
format.
from wallaroo.framework import Framework
import pyarrow as pa
input_schema = pa.schema([
pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])
output_schema = pa.schema([
pa.field('predictions', pa.int32()),
pa.field('probabilities', pa.list_(pa.float64(), list_size=3))
])
Upload a Model from a Registry
Models uploaded to the Wallaroo workspace are uploaded from a MLFlow Registry with the Wallaroo.Registry.upload
method.
Upload a Model from a Registry Parameters
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name to assign the model once uploaded. Model names are unique within a workspace. Models assigned the same name as an existing model will be uploaded as a new model version. |
path | string (Required) | The full path to the model artifact in the registry. |
framework | string (Required) | The Wallaroo model Framework . See Model Uploads and Registrations Supported Frameworks |
input_schema | pyarrow.lib.Schema (Required for non-native runtimes) | The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema (Required for non-native runtimes) | The output schema in Apache Arrow schema format. |
model = registry.upload_model(
name="verified-working",
path="https://sample.registry.service.azuredatabricks.net/api/2.0/dbfs/read?path=/databricks/mlflow-registry/9168792a16cb40a88de6959ef31e42a2/models/√erified-working/model.pkl",
framework=Framework.SKLEARN,
input_schema=input_schema,
output_schema=output_schema)
model
Name | verified-working |
Version | cf194b65-65b2-4d42-a4e2-6ca6fa5bfc42 |
File Name | model.pkl |
SHA | 5f4c25b0b564ab9fe0ea437424323501a460aa74463e81645a6419be67933ca4 |
Status | pending_conversion |
Image Path | None |
Updated At | 2023-17-Jul 17:57:23 |
Verify the Model Status
Once uploaded, the model will undergo conversion. The following will loop through the model status until it is ready. Once ready, it is available for deployment.
import time
while model.status() != "ready" and model.status() != "error":
print(model.status())
time.sleep(3)
print(model.status())
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
pending_conversion
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
converting
ready
Model Runtime
Once uploaded and converted, the model runtime is derived. This determines whether to allocate resources to pipeline’s native runtime environment or containerized runtime environment. For more details, see the Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration guide.
model.config().runtime()
'mlflow'
Deploy Pipeline
The model is uploaded and ready for use. We’ll add it as a step in our pipeline, then deploy the pipeline. For this example we’re allocated 0.5 cpu to the runtime environment and 1 CPU to the containerized runtime environment.
import os, json
from wallaroo.deployment_config import DeploymentConfigBuilder
deployment_config = DeploymentConfigBuilder().cpus(0.5).sidekick_cpus(model, 1).build()
pipeline = wl.build_pipeline("jefftest1")
pipeline = pipeline.add_model_step(model)
deployment = pipeline.deploy(deployment_config=deployment_config)
pipeline.status()
{'status': 'Running',
'details': [],
'engines': [{'ip': '10.244.3.148',
'name': 'engine-86c7fc5c95-8kwh5',
'status': 'Running',
'reason': None,
'details': [],
'pipeline_statuses': {'pipelines': [{'id': 'jefftest1',
'status': 'Running'}]},
'model_statuses': {'models': [{'name': 'verified-working',
'version': 'cf194b65-65b2-4d42-a4e2-6ca6fa5bfc42',
'sha': '5f4c25b0b564ab9fe0ea437424323501a460aa74463e81645a6419be67933ca4',
'status': 'Running'}]}}],
'engine_lbs': [{'ip': '10.244.4.203',
'name': 'engine-lb-584f54c899-tpv5b',
'status': 'Running',
'reason': None,
'details': []}],
'sidekicks': [{'ip': '10.244.0.225',
'name': 'engine-sidekick-verified-working-43-74f957566d-9zdfh',
'status': 'Running',
'reason': None,
'details': [],
'statuses': '\n'}]}
Run Inference
A sample inference will be run. First the pandas DataFrame used for the inference is created, then the inference run through the pipeline’s infer
method.
import pandas as pd
from sklearn.datasets import load_iris
data = load_iris(as_frame=True)
X = data['data'].values
dataframe = pd.DataFrame({"inputs": data['data'][:2].values.tolist()})
dataframe
inputs | |
---|---|
0 | [5.1, 3.5, 1.4, 0.2] |
1 | [4.9, 3.0, 1.4, 0.2] |
deployment.infer(dataframe)
time | in.inputs | out.predictions | out.probabilities | check_failures | |
---|---|---|---|---|---|
0 | 2023-07-17 17:59:18.840 | [5.1, 3.5, 1.4, 0.2] | 0 | [0.981814913291491, 0.018185072312411506, 1.43... | 0 |
1 | 2023-07-17 17:59:18.840 | [4.9, 3.0, 1.4, 0.2] | 0 | [0.9717552971628304, 0.02824467272952288, 3.01... | 0 |
Undeploy Pipelines
With the tutorial complete, the pipeline is undeployed to return the resources back to the cluster.
pipeline.undeploy()
name | jefftest1 |
---|---|
created | 2023-07-17 17:59:05.922172+00:00 |
last_updated | 2023-07-17 17:59:06.684060+00:00 |
deployed | False |
tags | |
versions | c2cca319-fcad-47b2-9de0-ad5b2852d1a2, f1e6d1b5-96ee-46a1-bfdf-174310ff4270 |
steps | verified-working |