This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Wallaroo Developer Guides

Reference Guides for Developers working with Wallaroo

The following guides are made to help users work with Wallaroo workspaces, upload models, set up pipelines and start inferring from data from their own applications using the Wallaroo SDK, API, and other coding examples.

We recommend first time users refer to the Wallaroo Tutorials that walk new users through several different models and ways of creating pipelines.

Supported Model Versions and Libraries

The following ML Model versions and Python libraries are supported by Wallaroo. When using the Wallaroo autoconversion library or working with a local version of the Wallaroo SDK, use the following versions for maximum compatibility.

Library Supported Version
Python 3.8.6 and above
onnx 1.12.0
tensorflow 2.9.1
keras 2.9.0
pytorch Latest stable version. When converting from PyTorch to onnx, verify that the onnx version matches the version above.
sk-learn aka scikit-learn 1.1.2
statsmodels 0.13.2
XGBoost 1.6.2
MLFlow 1.30.0

Supported Data Types

The following data types are supported for transporting data to and from Wallaroo in the following run times:

  • ONNX
  • TensorFlow
  • MLFlow

Float Types

Runtime BFloat16* Float16 Float32 Float64
ONNX X X
TensorFlow X X X
MLFlow X X X
  • * (Brain Float 16, represented internally as a f32)

Int Types

Runtime Int8 Int16 Int32 Int64
ONNX X X X X
TensorFlow X X X X
MLFlow X X X X

Uint Types

Runtime Uint8 Uint16 Uint32 Uint64
ONNX X X X X
TensorFlow X X X X
MLFlow X X X X

Other Types

Runtime Boolean Utf8 (String) Complex 64 Complex 128 FixedSizeList*
ONNX X
Tensor X X X
MLFlow X X X
  • * Fixed sized lists of any of the previously supported data types.

1 - Wallaroo API Guide

Reference Guide for the Wallaroo API

The Wallaroo API provides users the ability to commands to a Wallaroo instance via a RCP-like HTTP API interface. This allows organizations to develop applications that can administrate Wallaroo from their own custom applications.

Wallaroo API URLs

Each instance of Wallaroo provides the Wallaroo API specifications through the following URLs. Wallaroo uses the following format for its URLs. For more information, see the DNS Integration Guide.

https://{Wallaroo Prefix}{Service}{Wallaroo Suffix}.

Page URL Description
Wallaroo Instance API URL https://{Wallaroo Prefix}.api.{Wallaroo Suffix}/v1/api Address for the Wallaroo instance’s API. API requests will be submitted to this instance.
Wallaroo Instance API Documentation https://{Wallaroo Prefix}.api.{Wallaroo Suffix}/v1/api/docs A HTML rendered view of the Wallaroo API specification.
Wallaroo Documentation Site https://docs.wallaroo.ai/ Wallaroo Documentation Site
Wallaroo Enterprise Keycloak Service https://{Wallaroo Prefix}.keycloak.{Wallaroo Suffix} Keycloak administration console for managing users and groups. It is recommended not to interfere with this service unless necessary.
Wallaroo Enterprise Token Request URL https://{Wallaroo Prefix}.keycloak.{Wallaroo Enterprise Suffix}/auth/realms/master/protocol/openid-connect/token The Keycloak token retrieval URL.

For example, if the Wallaroo Enterprise Prefix is wallaroo and the suffix is example.com, the URLs would be as follows:

Page Example URL
Wallaroo Instance API URL https://wallaroo.api.example.com/v1/api
Wallaroo Instance API Documentation https://wallaroo.api.example.com/v1/api/docs
Wallaroo Documentation Site https://docs.wallaroo.ai/
Wallaroo Enterprise Keycloak Service https://wallaroo.keycloak.example.com
Wallaroo Token Request URL https://{Wallaroo Prefix}.keycloak.example.com/auth/realms/master/protocol/openid-connect/token

Authenticating to the Wallaroo API

Each MLOps API operation requires a valid JSON Web Token (JWT) obtained from Wallaroo’s authentication and authorization service (i.e., Keycloak). Generally, the JWT must include a valid user’s identity, as Wallaroo access permissions are tied to specific platform users.

To authenticate to the Wallaroo API, the options are either to authenticate with the client secret, or to use the SDK command Wallaroo.auth.auth_header() to retrieve the HTTP header including the token used to authenticate to the API.

The following process will retrieve a token using the client secret:

Retrieve Client Secret

Wallaroo comes pre-installed with a confidential OpenID Connect client. The default client is api-client, but other clients may be created and configured.

As it is a confidential client, api-client requires its secret to be supplied when requesting a token. Administrators may obtain their API client credentials from Keycloak from the Keycloak Service URL as listed above and the prefix /auth/admin/master/console/#/realms/master/clients.

For example, if the Wallaroo instance DNS address is https://magical-rhino-5555.wallaroo.dev, then the direct path to the Keycloak API client credentials would be:

https://magical-rhino-5555.keycloak.wallaroo.dev/auth/admin/master/console/#/realms/master/clients

Then select the client, in this case api-client, then Credentials.

Wallaroo Keycloak Service
Wallaroo Components

By default, tokens issued for api-client are valid for up to 60 minutes. Refresh tokens are supported.

Retrieve MLOps API Token

To retrieve an API token for a specific user with the Client Secret, request the token from the Wallaroo instance using the client secret and provide the following:

  • Token Request URL: The Keycloak token retrieval URL.
  • OpenID Connect client name: The name of the OpenID Connect client.
  • OpenID Connect client Secret: The secret for the OpenID Connect client.
  • UserName: The username of the Wallaroo instance user, usually their email address.
  • Password: The password of the Wallaroo instance user.

For example, the following requests a token for the Wallaroo instance https://magical-rhino-5555.wallaroo.dev for user mary.jane@example.com with the OpenID Connect Client api-client:

TOKENURL = 'https://magical-rhino-5555.keycloak.wallaroo.dev/auth/realms/master/protocol/openid-connect/token'
CLIENT ='api-client'
SECRET = 'abc123'
USERNAME = 'mary.jane@example.com'
PASSWORD = 'snugglebunnies'

TOKEN=$(curl 'https://magical-rhino-5555.keycloak.wallaroo.dev/auth/realms/master/protocol/openid-connect/token' -u "$CLIENT:$SECRET" -d "grant_type=password&username=$USERNAME&password=$PASSWORD" -s | jq -r '.access_token')

Request MLOps Operation with Token

With the token retrieve, a MLOps request can be performed.

The following example shows how to make the request using curl to retrieve a list of workspaces:

curl 'https://magical-rhino-5555.api.wallaroo.ai/v1/api/workspaces/list' -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' -d '{}'

The same can be done through the Python requests library:

## Inference through external URL

apiRequest = "https://magical-rhino-5555.api.wallaroo.ai/v1/api/workspaces/list"

# set the headers
headers= {
        'Authorization': 'Bearer ' + TOKEN,
        'Content-Type: application/json'
    }

data = {
}

# submit the request via POST
response = requests.post(apiRequest, data=data, headers=headers)

# Display the returned result
print(response.json())

1.1 - Wallaroo MLOps API Essentials Guide

Basic Guide for the Wallaroo API

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Wallaroo MLOps API Tutorial

The Wallaroo MLOps API allows organizations to submit requests to their Wallaroo instance to perform such actions as:

  • Create a new user and invite them to the instance.
  • Create workspaces and list their configuration details.
  • Upload a model.
  • Deploy and undeploy a pipeline.

The following examples will show how to submit queries to the Wallaroo MLOps API and the types of responses returned.

References

The following references are available for more information about Wallaroo and the Wallaroo MLOps API:

  • Wallaroo Documentation Site: The Wallaroo Documentation Site
  • Wallaroo MLOps API Documentation from a Wallaroo instance: A Swagger UI based documentation is available from your Wallaroo instance at https://{Wallaroo Prefix}.api.{Wallaroo Suffix}/v1/api/docs. For example, if the Wallaroo Instance is YOUR SUFFIX with the prefix {lovely-rhino-5555}, then the Wallaroo MLOps API Documentation would be available at https://lovely-rhino-5555.api.example.wallaroo.ai/v1/api/docs. For another example, a Wallaroo Enterprise users who do not use a prefix and has the suffix wallaroo.example.wallaroo.ai, the the Wallaroo MLOps API Documentation would be available at https://api.wallaroo.example.wallaroo.ai/v1/api/docs. For more information, see the Wallaroo Documentation Site.

IMPORTANT NOTE: The Wallaroo MLOps API is provided as an early access features. Future iterations may adjust the methods and returns to provide a better user experience. Please refer to this guide for updates.

OpenAPI Steps

The following demonstrates how to use each command in the Wallaroo MLOps API, and can be modified as best fits your organization’s needs.

Import Libraries

For the examples, the Python requests library will be used to make the REST HTTP(S) connections.

# Requires requests and requests-toolbelt with either:
# pip install requests-toolbelt
# conda install -c conda-forge requests-toolbelt

import requests
import json
from requests.auth import HTTPBasicAuth

Retrieve Credentials

Through Keycloak

Wallaroo comes pre-installed with a confidential OpenID Connect client. The default client is api-client, but other clients may be created and configured.

As it is a confidential client, api-client requires its secret to be supplied when requesting a token. Administrators may obtain their API client credentials from Keycloak from the Keycloak Service URL as listed above and the prefix /auth/admin/master/console/#/realms/master/clients.

For example, if the YOUR SUFFIX instance DNS address is https://magical-rhino-5555.example.wallaroo.ai, then the direct path to the Keycloak API client credentials would be:

https://magical-rhino-5555.keycloak.example.wallaroo.ai/auth/admin/master/console/#/realms/master/clients

Then select the client, in this case api-client, then Credentials.

Wallaroo Keycloak Service
Wallaroo Components

By default, tokens issued for api-client are valid for up to 60 minutes. Refresh tokens are supported.

Through the Wallaroo SDK

The API token can be retrieved using the Wallaroo SDK through the wallaroo.client.mlops() command. In the following example, the token will be retrieved and stored to the variable TOKEN:

connection =wl.mlops().__dict__
TOKEN = connection['token']
print(TOKEN)
'eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJnTHBSY1B6QkhjQ1k1RTFHTVZoTlQtelI0VDY2YUM0QWh2eXVORmpVOTBjIn0.eyJleHAiOjE2NzEwMzMzMzUsImlhdCI6MTY3MTAzMzI3NSwiYXV0aF90aW1lIjoxNjcxMDMyODgyLCJqdGkiOiJiNDk3YmM3Yy1kMTc5LTRhYWQtODdmZC0yZGJiYTBlZDI4ZDYiLCJpc3MiOiJodHRwczovL21hZ2ljYWwtYmVhci0zNzgyLmtleWNsb2FrLndhbGxhcm9vLmNvbW11bml0eS9hdXRoL3JlYWxtcy9tYXN0ZXIiLCJhdWQiOlsibWFzdGVyLXJlYWxtIiwiYWNjb3VudCJdLCJzdWIiOiJmMWYzMmJkZi05YmQ5LTQ1OTUtYTUzMS1hY2E1Nzc4Y2VhZjAiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJzZGstY2xpZW50Iiwic2Vzc2lvbl9zdGF0ZSI6IjYzYzNiZjYwLTNmNjMtNDBjNC05NmI1LWNiYTk4ZjZhOGNmNyIsImFjciI6IjEiLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsiY3JlYXRlLXJlYWxtIiwiZGVmYXVsdC1yb2xlcy1tYXN0ZXIiLCJvZmZsaW5lX2FjY2VzcyIsImFkbWluIiwidW1hX2F1dGhvcml6YXRpb24iXX0sInJlc291cmNlX2FjY2VzcyI6eyJtYXN0ZXItcmVhbG0iOnsicm9sZXMiOlsidmlldy1pZGVudGl0eS1wcm92aWRlcnMiLCJ2aWV3LXJlYWxtIiwibWFuYWdlLWlkZW50aXR5LXByb3ZpZGVycyIsImltcGVyc29uYXRpb24iLCJjcmVhdGUtY2xpZW50IiwibWFuYWdlLXVzZXJzIiwicXVlcnktcmVhbG1zIiwidmlldy1hdXRob3JpemF0aW9uIiwicXVlcnktY2xpZW50cyIsInF1ZXJ5LXVzZXJzIiwibWFuYWdlLWV2ZW50cyIsIm1hbmFnZS1yZWFsbSIsInZpZXctZXZlbnRzIiwidmlldy11c2VycyIsInZpZXctY2xpZW50cyIsIm1hbmFnZS1hdXRob3JpemF0aW9uIiwibWFuYWdlLWNsaWVudHMiLCJxdWVyeS1ncm91cHMiXX0sImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoiZW1haWwgcHJvZmlsZSIsInNpZCI6IjYzYzNiZjYwLTNmNjMtNDBjNC05NmI1LWNiYTk4ZjZhOGNmNyIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJodHRwczovL2hhc3VyYS5pby9qd3QvY2xhaW1zIjp7IngtaGFzdXJhLXVzZXItaWQiOiJmMWYzMmJkZi05YmQ5LTQ1OTUtYTUzMS1hY2E1Nzc4Y2VhZjAiLCJ4LWhhc3VyYS1kZWZhdWx0LXJvbGUiOiJ1c2VyIiwieC1oYXN1cmEtYWxsb3dlZC1yb2xlcyI6WyJ1c2VyIl0sIngtaGFzdXJhLXVzZXItZ3JvdXBzIjoie30ifSwicHJlZmVycmVkX3VzZXJuYW1lIjoiam9obi5oYW5zYXJpY2tAd2FsbGFyb28uYWkiLCJlbWFpbCI6ImpvaG4uaGFuc2FyaWNrQHdhbGxhcm9vLmFpIn0.EEt9UK1jxvO1DYg_hiy1ne4s9iK8mJtqbVfE7MPQfMRYhzXqDU4gFpP3Nwzlo0iW9fSLDiCxPg303Rz-l4it3oPFu5SaS1S8pQpqvtMAJqy8V_CNPp5H5ggQFYm4Z50aAPdPzOOOkVQOZUhupRsEeUERvK1-eFqtG1bb-IUV6DpQO_XaRVcQbIVubFi48C0_im5Tb3i4WFCNA_1pRrEBKFbZLWgzSCu8fglBQ27mODqfmRQVbTeXLjxsQX5O8meErSfibEGmsJKQytGCJ3NYdnXfal3YhWEqp6A4dG0tkoRW1eD-aKBpsHf9nKKzxcSsjeXDQF6iQAONCGmC40oqHQ'

Set Variables

The following variables are used for the example and should be modified to fit your organization.

## Variables

URLPREFIX='YOURPREFIX'
URLSUFFIX='YOURSUFFIX'
SECRET="YOUR SECRET"
TOKENURL=f'https://{URLPREFIX}.keycloak.{URLSUFFIX}/auth/realms/master/protocol/openid-connect/token'
CLIENT="api-client"
USERNAME="YOUR EMAIL"
PASSWORD="YOUR PASSWORD"
APIURL=f"https://{URLPREFIX}.api.{URLSUFFIX}/v1/api"
newUser="NEW USER EMAIL"
newPassword="NEW USER PASSWORD"

The following is an output of the TOKENURL variable to verify it matches your Wallaroo instance’s Keycloak API client credentials URL.

TOKENURL

API Example Methods

The following methods are used to retrieve the MLOPs API Token from the Wallaroo instance’s Keycloak service, and submit MLOps API requests through the Wallaroo instance’s MLOps API.

MLOps API requests are always POST, and are either submitted as 'Content-Type':'application/json' or as a multipart submission including a file.

def get_jwt_token(url, client, secret, username, password):
    auth = HTTPBasicAuth(client, secret)
    data = {
        'grant_type': 'password',
        'username': username,
        'password': password
    }
    response = requests.post(url, auth=auth, data=data, verify=True)
    return response.json()['access_token']

# This can either submit a plain POST request ('Content-Type':'application/json'), or with a file.

def get_wallaroo_response(url, api_request, token, data, files=None, contentType='application/json', params=None):
    apiUrl=f"{url}{api_request}"
    if files is None:
        # Regular POST request
        headers= {
            'Authorization': 'Bearer ' + token,
            'Content-Type':contentType
        }
        response = requests.post(apiUrl, json=data, headers=headers, verify=True)
    elif contentType == 'application/octet-stream':
        # Post request as octet-stream
        headers= {
            'Authorization': 'Bearer ' + token,
            'Content-Type':contentType
        }
        response = requests.post(apiUrl, data=files, headers=headers, params=params)
        #response = requests.post(apiUrl, data=data, headers=headers, files=files, verify=True)
    else:
        # POST request with file
        headers= {
            'Authorization': 'Bearer ' + token
        }
        response = requests.post(apiUrl, data=data, headers=headers, files=files, verify=True)
    return response.json()

Retrieve MLOps API Token

To retrieve an API token for a specific user with the Client Secret, request the token from the Wallaroo instance using the client secret and provide the following:

  • Token Request URL: The Keycloak token retrieval URL.
  • OpenID Connect client name: The name of the OpenID Connect client.
  • OpenID Connect client Secret: The secret for the OpenID Connect client.
  • UserName: The username of the Wallaroo instance user, usually their email address.
  • Password: The password of the Wallaroo instance user.

The following sample uses the variables set above to request the token, then displays it.

TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
TOKEN
'eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJKaDZyX3BKUGhheVhSSlc0U1F3ckc0QUEwUmkyMHNBMTNxYmNhTVJ1d1hrIn0.eyJleHAiOjE2Nzc3ODk4MzEsImlhdCI6MTY3Nzc4NjIzMSwianRpIjoiMjg1MTU1NmItZjhkNC00OWZkLWJjMjEtOGFlNDI2OTJiM2FiIiwiaXNzIjoiaHR0cHM6Ly9kb2MtdGVzdC5rZXljbG9hay53YWxsYXJvb2NvbW11bml0eS5uaW5qYS9hdXRoL3JlYWxtcy9tYXN0ZXIiLCJhdWQiOlsibWFzdGVyLXJlYWxtIiwiYWNjb3VudCJdLCJzdWIiOiJjYTdkNzA0My04ZTk0LTQyZDUtOWYzYS04ZjU1YzJlNDI4MTQiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJhcGktY2xpZW50Iiwic2Vzc2lvbl9zdGF0ZSI6IjFiNTEyOTZiLTMwNjAtNGUwYy1hZDMwLTNhYjczYmNiMDYzNyIsImFjciI6IjEiLCJhbGxvd2VkLW9yaWdpbnMiOlsiKiJdLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsiZGVmYXVsdC1yb2xlcy1tYXN0ZXIiLCJvZmZsaW5lX2FjY2VzcyIsInVtYV9hdXRob3JpemF0aW9uIl19LCJyZXNvdXJjZV9hY2Nlc3MiOnsibWFzdGVyLXJlYWxtIjp7InJvbGVzIjpbIm1hbmFnZS11c2VycyIsInZpZXctdXNlcnMiLCJxdWVyeS1ncm91cHMiLCJxdWVyeS11c2VycyJdfSwiYWNjb3VudCI6eyJyb2xlcyI6WyJtYW5hZ2UtYWNjb3VudCIsIm1hbmFnZS1hY2NvdW50LWxpbmtzIiwidmlldy1wcm9maWxlIl19fSwic2NvcGUiOiJwcm9maWxlIGVtYWlsIiwic2lkIjoiMWI1MTI5NmItMzA2MC00ZTBjLWFkMzAtM2FiNzNiY2IwNjM3IiwiZW1haWxfdmVyaWZpZWQiOmZhbHNlLCJodHRwczovL2hhc3VyYS5pby9qd3QvY2xhaW1zIjp7IngtaGFzdXJhLXVzZXItaWQiOiJjYTdkNzA0My04ZTk0LTQyZDUtOWYzYS04ZjU1YzJlNDI4MTQiLCJ4LWhhc3VyYS1kZWZhdWx0LXJvbGUiOiJ1c2VyIiwieC1oYXN1cmEtYWxsb3dlZC1yb2xlcyI6WyJ1c2VyIl0sIngtaGFzdXJhLXVzZXItZ3JvdXBzIjoie30ifSwibmFtZSI6IkpvaG4gSGFuc2FyaWNrIiwicHJlZmVycmVkX3VzZXJuYW1lIjoiam9obi5odW1tZWxAd2FsbGFyb28uYWkiLCJnaXZlbl9uYW1lIjoiSm9obiIsImZhbWlseV9uYW1lIjoiSGFuc2FyaWNrIiwiZW1haWwiOiJqb2huLmh1bW1lbEB3YWxsYXJvby5haSJ9.Qxhsu1lbhWpVZyUjKLqsr47j-ybjVB28jEXPcyb8m4NlzYDSfWHH2Wc7i1RMLV4IUe4td8ujPQJjkan2zatoHhSNqWYwEziwgFwIcP-uYqDcBhIIkNIu3Shw8f9FxAt3UtEc0twTXNED4ak2cfTs9nNwF2v_ZRcKMsrWObAfm2Iuly2tKuu6TlK_3Nbi6DTip4rXTO5AavIhjqKZn7ofuJ-NhOBh9s9gZPIZpWQ-klk-zeM7mzzulD8THBTCITvEpmMSJf9qI24-QXQWhpRFEpmUh8gy6GkQs1lEcjvt8NzLP5mf9L7fmgQZCgvETLwuA9dmp7BPYS_G3pamDGqDoA'

Users

Get Users

Users can be retrieved either by their Keycloak user id, or return all users if an empty set {} is submitted.

  • Parameters
    • {}: Empty set, returns all users.
    • user_ids Array[Keycloak user ids]: An array of Keycloak user ids, typically in UUID format.

Example: The first example will submit an empty set {} to return all users, then submit the first user’s user id and request only that user’s details.

# Get all users

TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)

apiRequest = "/users/query"
data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'users': {'9727c4a8-d6fc-4aad-894f-9ee69801d2dd': {'access': {'manageGroupMembership': True,
    'impersonate': False,
    'view': True,
    'mapRoles': True,
    'manage': True},
   'createdTimestamp': 1677704075554,
   'disableableCredentialTypes': [],
   'emailVerified': False,
   'enabled': True,
   'id': '9727c4a8-d6fc-4aad-894f-9ee69801d2dd',
   'notBefore': 0,
   'requiredActions': [],
   'username': 'admin'},
  'ca7d7043-8e94-42d5-9f3a-8f55c2e42814': {'access': {'impersonate': False,
    'manage': True,
    'mapRoles': True,
    'manageGroupMembership': True,
    'view': True},
   'createdTimestamp': 1677704179667,
   'disableableCredentialTypes': [],
   'email': 'john.hummel@wallaroo.ai',
   'emailVerified': False,
   'enabled': True,
   'firstName': 'John',
   'id': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'lastName': 'Hansarick',
   'notBefore': 0,
   'requiredActions': [],
   'username': 'john.hummel@wallaroo.ai'}}}
# Get first user Keycloak id
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
firstUserKeycloak = list(response['users'])[1]

apiRequest = "/users/query"
data = {
  "user_ids": [
    firstUserKeycloak
  ]
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'users': {'ca7d7043-8e94-42d5-9f3a-8f55c2e42814': {'access': {'impersonate': False,
    'view': True,
    'mapRoles': True,
    'manage': True,
    'manageGroupMembership': True},
   'createdTimestamp': 1677704179667,
   'disableableCredentialTypes': [],
   'email': 'john.hummel@wallaroo.ai',
   'emailVerified': False,
   'enabled': True,
   'federatedIdentities': [{'identityProvider': 'google',
     'userId': '117610299312093432527',
     'userName': 'john.hummel@wallaroo.ai'}],
   'firstName': 'John',
   'id': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'lastName': 'Hansarick',
   'notBefore': 0,
   'requiredActions': [],
   'username': 'john.hummel@wallaroo.ai'}}}

Invite Users

IMPORTANT NOTE: This command is for YOUR SUFFIX only. For more details on user management, see Wallaroo User Management.

Users can be invited through /users/invite. When using YOUR SUFFIX, this will send an invitation email to the email address listed. Note that the user must not already be a member of the Wallaroo instance, and email addresses must be unique. If the email address is already in use for another user, the request will generate an error.

  • Parameters
    • email *(REQUIRED string): The email address of the new user to invite.
    • password (OPTIONAL string): The assigned password of the new user to invite. If not provided, the Wallaroo instance will provide the new user a temporary password that must be changed upon initial login.

Example: In this example, a new user will be invited to the Wallaroo instance and assigned a password.

# invite users
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/users/invite"
data = {
    "email": newUser,
    "password":newPassword
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Deactivate User

Users can be deactivated so they can not login to their Wallaroo instance. Deactivated users do not count against the Wallaroo license count.

  • Parameters
    • email (REQUIRED string): The email address of the user to deactivate.

Example: In this example, the newUser will be deactivated.

# Deactivate users
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/users/deactivate"

data = {
    "email": newUser
}
response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Activate User

A deactivated user can be reactivated to allow them access to their Wallaroo instance. Activated users count against the Wallaroo license count.

  • Parameters
    • email (REQUIRED string): The email address of the user to activate.

Example: In this example, the newUser will be activated.

# Activate users
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/users/activate"

data = {
    "email": newUser
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Workspaces

List Workspaces

List the workspaces for a specific user.

  • Parameters
    • user_id - (OPTIONAL string): The Keycloak ID.

Example: In this example, the workspaces for the a specific user will be displayed, then workspaces for all users will be displayed.

# List workspaces by user id
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/workspaces/list"

data = {
    "user_id":firstUserKeycloak
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'workspaces': [{'id': 1,
   'name': 'john.hummel@wallaroo.ai - Default Workspace',
   'created_at': '2023-03-01T20:56:22.658436+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [],
   'pipelines': []},
  {'id': 4,
   'name': 'anomalyexampletest3',
   'created_at': '2023-03-01T20:56:32.632146+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [1],
   'pipelines': [1]},
  {'id': 5,
   'name': 'ccfraudcomparisondemo',
   'created_at': '2023-03-01T21:02:40.955593+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [2, 3, 4],
   'pipelines': [3]},
  {'id': 6,
   'name': 'rlhxccfraudworkspace',
   'created_at': '2023-03-01T21:30:28.848609+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [5],
   'pipelines': [5]},
  {'id': 7,
   'name': 'mlflowstatsmodelworkspace',
   'created_at': '2023-03-02T18:06:42.074341+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [6, 7],
   'pipelines': [8]},
  {'id': 8,
   'name': 'mobilenetworkspace',
   'created_at': '2023-03-02T18:24:27.304478+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [8, 9],
   'pipelines': [10]},
  {'id': 9,
   'name': 'mobilenetworkspacetest',
   'created_at': '2023-03-02T19:21:36.309503+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [10, 12],
   'pipelines': [13]},
  {'id': 10,
   'name': 'resnetworkspace',
   'created_at': '2023-03-02T19:22:28.371499+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [11],
   'pipelines': [14]},
  {'id': 11,
   'name': 'resnetworkspacetest',
   'created_at': '2023-03-02T19:35:30.236438+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [13],
   'pipelines': [18]},
  {'id': 12,
   'name': 'shadowimageworkspacetest',
   'created_at': '2023-03-02T19:37:23.348346+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [14, 15],
   'pipelines': [20]}]}
# List workspaces
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/workspaces/list"

data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'workspaces': [{'id': 1,
   'name': 'john.hummel@wallaroo.ai - Default Workspace',
   'created_at': '2023-03-01T20:56:22.658436+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [],
   'pipelines': []},
  {'id': 4,
   'name': 'anomalyexampletest3',
   'created_at': '2023-03-01T20:56:32.632146+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [1],
   'pipelines': [1]},
  {'id': 5,
   'name': 'ccfraudcomparisondemo',
   'created_at': '2023-03-01T21:02:40.955593+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [2, 3, 4],
   'pipelines': [3]},
  {'id': 6,
   'name': 'rlhxccfraudworkspace',
   'created_at': '2023-03-01T21:30:28.848609+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [5],
   'pipelines': [5]},
  {'id': 7,
   'name': 'mlflowstatsmodelworkspace',
   'created_at': '2023-03-02T18:06:42.074341+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [6, 7],
   'pipelines': [8]},
  {'id': 8,
   'name': 'mobilenetworkspace',
   'created_at': '2023-03-02T18:24:27.304478+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [8, 9],
   'pipelines': [10]},
  {'id': 9,
   'name': 'mobilenetworkspacetest',
   'created_at': '2023-03-02T19:21:36.309503+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [10, 12],
   'pipelines': [13]},
  {'id': 10,
   'name': 'resnetworkspace',
   'created_at': '2023-03-02T19:22:28.371499+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [11],
   'pipelines': [14]},
  {'id': 11,
   'name': 'resnetworkspacetest',
   'created_at': '2023-03-02T19:35:30.236438+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [13],
   'pipelines': [18]},
  {'id': 12,
   'name': 'shadowimageworkspacetest',
   'created_at': '2023-03-02T19:37:23.348346+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [14, 15],
   'pipelines': [20]}]}

Create Workspace

A new workspace will be created in the Wallaroo instance. Upon creating, the workspace owner will be assigned as the user making the MLOps API request.

  • Parameters:
    • workspace_name - (REQUIRED string): The name of the new workspace.
  • Returns:
    • workspace_id - (int): The ID of the new workspace.

Example: In this example, a workspace with the name testapiworkspace will be created, and the newly created workspace’s workspace_id saved as the variable exampleWorkspaceId for use in other code examples. After the request is complete, the List Workspaces command will be issued to demonstrate the new workspace has been created.

# Create workspace
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/workspaces/create"

exampleWorkspaceName = "testapiworkspace"
data = {
  "workspace_name": exampleWorkspaceName
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
# Stored for future examples
exampleWorkspaceId = response['workspace_id']
response
{'workspace_id': 13}
# List workspaces
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/workspaces/list"

data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'workspaces': [{'id': 1,
   'name': 'john.hummel@wallaroo.ai - Default Workspace',
   'created_at': '2023-03-01T20:56:22.658436+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [],
   'pipelines': []},
  {'id': 4,
   'name': 'anomalyexampletest3',
   'created_at': '2023-03-01T20:56:32.632146+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [1],
   'pipelines': [1]},
  {'id': 5,
   'name': 'ccfraudcomparisondemo',
   'created_at': '2023-03-01T21:02:40.955593+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [2, 3, 4],
   'pipelines': [3]},
  {'id': 6,
   'name': 'rlhxccfraudworkspace',
   'created_at': '2023-03-01T21:30:28.848609+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [5],
   'pipelines': [5]},
  {'id': 7,
   'name': 'mlflowstatsmodelworkspace',
   'created_at': '2023-03-02T18:06:42.074341+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [6, 7],
   'pipelines': [8]},
  {'id': 8,
   'name': 'mobilenetworkspace',
   'created_at': '2023-03-02T18:24:27.304478+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [8, 9],
   'pipelines': [10]},
  {'id': 9,
   'name': 'mobilenetworkspacetest',
   'created_at': '2023-03-02T19:21:36.309503+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [10, 12],
   'pipelines': [13]},
  {'id': 10,
   'name': 'resnetworkspace',
   'created_at': '2023-03-02T19:22:28.371499+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [11],
   'pipelines': [14]},
  {'id': 11,
   'name': 'resnetworkspacetest',
   'created_at': '2023-03-02T19:35:30.236438+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [13],
   'pipelines': [18]},
  {'id': 12,
   'name': 'shadowimageworkspacetest',
   'created_at': '2023-03-02T19:37:23.348346+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [14, 15],
   'pipelines': [20]},
  {'id': 13,
   'name': 'testapiworkspace',
   'created_at': '2023-03-02T19:44:20.279346+00:00',
   'created_by': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'archived': False,
   'models': [],
   'pipelines': []}]}

Add User to Workspace

Existing users of the Wallaroo instance can be added to an existing workspace.

  • Parameters
    • email - (REQUIRED string): The email address of the user to add to the workspace.
    • workspace_id - (REQUIRED int): The id of the workspace.

Example: The following example adds the user created in Invite Users request to the workspace created in the Create Workspace request.

# Add existing user to existing workspace
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/workspaces/add_user"

data = {
  "email":newUser,
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

List Users in a Workspace

Lists the users who are either owners or collaborators of a workspace.

  • Parameters
    • workspace_id - (REQUIRED int): The id of the workspace.
  • Returns
    • user_id: The user’s identification.
    • user_type: The user’s workspace type (owner, co-owner, etc).

Example: The following example will list all users part of the workspace created in the Create Workspace request.

# List users in a workspace
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/workspaces/list_users"

data = {
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'users': [{'user_id': 'ca7d7043-8e94-42d5-9f3a-8f55c2e42814',
   'user_type': 'OWNER'}]}

Remove User from a Workspace

Removes the user from the given workspace. In this request, either the user’s Keycloak ID is required OR the user’s email address is required.

  • Parameters
    • workspace_id - (REQUIRED int): The id of the workspace.
    • user_id - (string): The Keycloak ID of the user. If email is not provided, then this parameter is REQUIRED.
    • email - (string): The user’s email address. If user_id is not provided, then this parameter is REQUIRED.
  • Returns
    • user_id: The user’s identification.
    • user_type: The user’s workspace type (owner, co-owner, etc).

Example: The following example will remove the newUser from workspace created in the Create Workspace request. Then the users for that workspace will be listed to verify newUser has been removed.

# Remove existing user from an existing workspace
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/workspaces/remove_user"

data = {
  "email":newUser,
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
# List users in a workspace
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/workspaces/list_users"

data = {
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'users': [{'user_id': '5abe33ef-90d2-49bd-8f6a-21ef20c383e8',
   'user_type': 'OWNER'}]}

Models

Upload Model to Workspace

Uploads a ML Model to a Wallaroo workspace via POST with Content-Type: multipart/form-data.

  • Parameters
    • name - (REQUIRED string): Name of the model
    • visibility - (OPTIONAL string): The visibility of the model as either public or private.
    • workspace_id - (REQUIRED int): The numerical id of the workspace to upload the model to.

Example: This example will upload the sample file ccfraud.onnx to the workspace created in the Create Workspace step as apitestmodel. The model name will be saved as exampleModelName for use in other examples. The id of the uploaded model will be saved as exampleModelId for use in later examples.

# upload model - uses multiform data through a Python `request`
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/models/upload"

exampleModelName = "apitestmodel"

data = {
    "name":exampleModelName,
    "visibility":"public",
    "workspace_id": exampleWorkspaceId
}

files = {
    "file": ('ccfraud.onnx', open('./models/ccfraud.onnx','rb'))
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data, files)
response
{'insert_models': {'returning': [{'models': [{'id': 16}]}]}}
exampleModelId=response['insert_models']['returning'][0]['models'][0]['id']
exampleModelId
16

Stream Upload Model to Workspace

Streams a potentially large ML Model to a Wallaroo workspace via POST with Content-Type: multipart/form-data.

  • Parameters
    • name - (REQUIRED string): Name of the model
    • filename - (REQUIRED string): Name of the file being uploaded.
    • visibility - (OPTIONAL string): The visibility of the model as either public or private.
    • workspace_id - (REQUIRED int): The numerical id of the workspace to upload the model to.

Example: This example will upload the sample file ccfraud.onnx to the workspace created in the Create Workspace step as apitestmodel.

# stream upload model - next test is adding arbitrary chunks to the stream
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/models/upload_stream"
exampleModelName = "apitestmodelstream"
filename = 'streamfile.onnx'

data = {
    "name":exampleModelName,
    "filename": 'streamfile.onnx',
    "visibility":"public",
    "workspace_id": exampleWorkspaceId
}

contentType='application/octet-stream'

file = open('./models/ccfraud.onnx','rb')

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data=None, files=file, contentType='application/octet-stream', params=data)
response
{'insert_models': {'returning': [{'models': [{'id': 17}]}]}}

List Models in Workspace

Returns a list of models added to a specific workspace.

  • Parameters
    • workspace_id - (REQUIRED int): The workspace id to list.

Example: Display the models for the workspace used in the Upload Model to Workspace step. The model id and model name will be saved as exampleModelId and exampleModelName variables for other examples.

# List models in a workspace
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/models/list"

data = {
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'models': [{'id': 17,
   'name': 'apitestmodelstream',
   'owner_id': '""',
   'created_at': '2023-03-02T19:44:56.549555+00:00',
   'updated_at': '2023-03-02T19:44:56.549555+00:00'},
  {'id': 16,
   'name': 'apitestmodel',
   'owner_id': '""',
   'created_at': '2023-03-02T19:44:53.173913+00:00',
   'updated_at': '2023-03-02T19:44:53.173913+00:00'}]}
exampleModelId = response['models'][0]['id']
exampleModelName = response['models'][0]['name']

Get Model Details By ID

Returns the model details by the specific model id.

  • Parameters
    • workspace_id - (REQUIRED int): The workspace id to list.
  • Returns
    • id - (int): Numerical id of the model.
    • owner_id - (string): Id of the owner of the model.
    • workspace_id - (int): Numerical of the id the model is in.
    • name - (string): Name of the model.
    • updated_at - (DateTime): Date and time of the model’s last update.
    • created_at - (DateTime): Date and time of the model’s creation.
    • model_config - (string): Details of the model’s configuration.

Example: Retrieve the details for the model uploaded in the Upload Model to Workspace step.

# Get model details by id
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/models/get_by_id"

data = {
  "id": exampleModelId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'id': 17,
 'owner_id': '""',
 'workspace_id': 13,
 'name': 'apitestmodelstream',
 'updated_at': '2023-03-02T19:44:56.549555+00:00',
 'created_at': '2023-03-02T19:44:56.549555+00:00',
 'model_config': None}

Get Model Versions

Retrieves all versions of a model based on either the name of the model or the model_pk_id.

  • Parameters
    • model_id - (REQUIRED String): The model name.
    • models_pk_id - (REQUIRED int): The model integer pk id.
  • Returns
    • Array(Model Details)
      • sha - (String): The sha hash of the model version.
      • models_pk_id- (int): The pk id of the model.
      • model_version - (String): The UUID identifier of the model version.
      • owner_id - (String): The Keycloak user id of the model’s owner.
      • model_id - (String): The name of the model.
      • id - (int): The integer id of the model.
      • file_name - (String): The filename used when uploading the model.
      • image_path - (String): The image path of the model.

Example: Retrieve the versions for a previously uploaded model. The variables exampleModelVersion and exampleModelSha will store the model’s version and SHA values for use in other examples.

# List models in a workspace
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/models/list_versions"

data = {
  "model_id": exampleModelName,
  "models_pk_id": exampleModelId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
[{'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
  'models_pk_id': 17,
  'model_version': '0396de99-0a55-4880-9f53-8fdcd1b3357a',
  'owner_id': '""',
  'model_id': 'apitestmodelstream',
  'id': 17,
  'file_name': 'streamfile.onnx',
  'image_path': None}]
# Stored for future examples

exampleModelVersion = response[0]['model_version']
exampleModelSha = response[0]['sha']

Get Model Configuration by Id

Returns the model’s configuration details.

  • Parameters
    • model_id - (REQUIRED int): The numerical value of the model’s id.

Example: Submit the model id for the model uploaded in the Upload Model to Workspace step to retrieve configuration details.

# Get model config by id
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/models/get_config_by_id"

data = {
  "model_id": exampleModelId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'model_config': None}

Get Model Details

Returns details regarding a single model, including versions.

Returns the model’s configuration details.

  • Parameters
    • model_id - (REQUIRED int): The numerical value of the model’s id.

Example: Submit the model id for the model uploaded in the Upload Model to Workspace step to retrieve configuration details.

# Get model config by id
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/models/get"

data = {
  "id": exampleModelId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'id': 17,
 'name': 'apitestmodelstream',
 'owner_id': '""',
 'created_at': '2023-03-02T19:44:56.549555+00:00',
 'updated_at': '2023-03-02T19:44:56.549555+00:00',
 'models': [{'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
   'models_pk_id': 17,
   'model_version': '0396de99-0a55-4880-9f53-8fdcd1b3357a',
   'owner_id': '""',
   'model_id': 'apitestmodelstream',
   'id': 17,
   'file_name': 'streamfile.onnx',
   'image_path': None}]}

Pipeline Management

Pipelines can be managed through the Wallaroo API. Pipelines are the vehicle used for deploying, serving, and monitoring ML models. For more information, see the Wallaroo Glossary.

Create Pipeline in a Workspace

Creates a new pipeline in the specified workspace.

  • Parameters
    • pipeline_id - (REQUIRED string): Name of the new pipeline.
    • workspace_id - (REQUIRED int): Numerical id of the workspace for the new pipeline.
    • definition - (REQUIRED string): Pipeline definitions, can be {} for none.

Example: Two pipelines are created in the workspace created in the step Create Workspace. One will be an empty pipeline without any models, the other will be created using the uploaded models in the Upload Model to Workspace step and no configuration details. The pipeline id, variant id, and variant version of each pipeline will be stored for later examples.

# Create pipeline in a workspace
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/pipelines/create"

exampleEmptyPipelineName="emptypipeline"

data = {
  "pipeline_id": exampleEmptyPipelineName,
  "workspace_id": exampleWorkspaceId,
  "definition": {}
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
exampleEmptyPipelineId = response['pipeline_pk_id']
exampleEmptyPipelineVariantId=response['pipeline_variant_pk_id']
emptyExamplePipelineVariantVersion=['pipeline_variant_version']
response
{'pipeline_pk_id': 22,
 'pipeline_variant_pk_id': 22,
 'pipeline_variant_version': 'e4c3a3dc-97ee-4020-88ce-3bf059b772ef'}
# Create pipeline in a workspace with models
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/pipelines/create"

exampleModelPipelineName="pipelinewithmodel"

data = {
  "pipeline_id": exampleModelPipelineName,
  "workspace_id": exampleWorkspaceId,
  "definition": {}
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
exampleModelPipelineId = response['pipeline_pk_id']
exampleModelPipelineVariantId=response['pipeline_variant_pk_id']
emptyModelPipelineVariantVersion=['pipeline_variant_version']
response
{'pipeline_pk_id': 25,
 'pipeline_variant_pk_id': 25,
 'pipeline_variant_version': '8eaa146e-1bfb-4786-9969-4264877db7d2'}

Deploy a Pipeline

Deploy a an existing pipeline. Note that for any pipeline that has model steps, they must be included either in model_configs, model_ids or models.

  • Parameters
    • deploy_id (REQUIRED string): The name for the pipeline deployment.
    • engine_config (OPTIONAL string): Additional configuration options for the pipeline.
    • pipeline_version_pk_id (REQUIRED int): Pipeline version id.
    • model_configs (OPTIONALArray int): Ids of model configs to apply.
    • model_ids (OPTIONALArray int): Ids of models to apply to the pipeline. If passed in, model_configs will be created automatically.
    • models (OPTIONAL Array models): If the model ids are not available as a pipeline step, the models’ data can be passed to it through this method. The options below are only required if models are provided as a parameter.
      • name (REQUIRED string): Name of the uploaded model that is in the same workspace as the pipeline.
      • version (REQUIRED string): Version of the model to use.
      • sha (REQUIRED string): SHA value of the model.
    • pipeline_id (REQUIRED int): Numerical value of the pipeline to deploy.
  • Returns
    • id (int): The deployment id.

Examples: Both the empty pipeline and pipeline with model created in the step Create Pipeline in a Workspace will be deployed and their deployment information saved for later examples.

# Deploy empty pipeline
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/pipelines/deploy"

exampleEmptyDeployId = "emptydeploy"

data = {
    "deploy_id": exampleEmptyDeployId,
    "pipeline_version_pk_id": exampleEmptyPipelineVariantId,
    "pipeline_id": exampleEmptyPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
exampleEmptyDeploymentId=response['id']
response
{'id': 14}
# Deploy a pipeline with models
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/pipelines/deploy"
exampleModelDeployId="modeldeploy"

data = {
    "deploy_id": exampleModelDeployId,
    "pipeline_version_pk_id": exampleModelPipelineVariantId,
    "models": [
        {
            "name":exampleModelName,
            "version":exampleModelVersion,
            "sha":exampleModelSha
        }
    ],
    "pipeline_id": exampleModelPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
exampleModelDeploymentId=response['id']
response
{'id': 17}

Get Deployment Status

Returns the deployment status.

  • Parameters
    • name - (REQUIRED string): The deployment in the format {deployment_name}-{deploymnent-id}.

Example: The deployed empty and model pipelines status will be displayed.

# Get empty pipeline deployment
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/status/get_deployment"

data = {
  "name": f"{exampleEmptyDeployId}-{exampleEmptyDeploymentId}"
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'status': 'Starting',
 'details': [],
 'engines': [{'ip': None,
   'name': 'engine-7c44d857cb-995p7',
   'status': 'Pending',
   'reason': None,
   'details': ['containers with unready status: [engine]',
    'containers with unready status: [engine]'],
   'pipeline_statuses': None,
   'model_statuses': None}],
 'engine_lbs': [{'ip': '10.244.12.53',
   'name': 'engine-lb-ddd995646-vjz7f',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}
# Get model pipeline deployment

apiRequest = "/status/get_deployment"

data = {
  "name": f"{exampleModelDeployId}-{exampleModelDeploymentId}"
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.13.27',
   'name': 'engine-7df9567698-m7zdx',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'pipelinewithmodel',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'apitestmodelstream',
      'version': '0396de99-0a55-4880-9f53-8fdcd1b3357a',
      'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.12.55',
   'name': 'engine-lb-ddd995646-qk6tj',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Get External Inference URL

The API command /admin/get_pipeline_external_url retrieves the external inference URL for a specific pipeline in a workspace.

  • Parameters
    • workspace_id (REQUIRED integer): The workspace integer id.
    • pipeline_name (REQUIRED string): The name of the deployment.

In this example, a list of the workspaces will be retrieved. Based on the setup from the Internal Pipeline Deployment URL Tutorial, the workspace matching urlworkspace will have it’s workspace id stored and used for the /admin/get_pipeline_external_url request with the pipeline urlpipeline.

The External Inference URL will be stored as a variable for the next step.

## Retrieve the pipeline's External Inference URL
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)

apiRequest = "/admin/get_pipeline_external_url"

data = {
    "workspace_id": exampleWorkspaceId,
    "pipeline_name": exampleModelDeployId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
print(response)
externalUrl = response['url']
externalUrl
{'url': 'https://doc-test.api.example.com/v1/api/pipelines/infer/modeldeploy-15'}

https://doc-test.api.example.com/v1/api/pipelines/infer/modeldeploy-15'

Perform Inference Through External URL

The inference can now be performed through the External Inference URL. This URL will accept the same inference data file that is used with the Wallaroo SDK, or with an Internal Inference URL as used in the Internal Pipeline Inference URL Tutorial.

For this example, the externalUrl retrieved through the Get External Inference URL is used to submit a single inference request.

If the Wallaroo instance has been configured to enable Arrow support, then the file cc_data_1k.df.json will be used. This is a DataFrame object. If Arrow support has not been enabled, then the inference request is used with the Wallaroo proprietary JSON data file cc_data_1k.json.

#TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
# if Arrow has been enabled, set this to True.  Otherwise, leave as False.
arrowEnabled = True
## Inference through external URL

# retrieve the json data to submit
if arrowEnabled is True:
    dataFile = './data/cc_data_1k.df.json'
    data = json.load(open('./data/cc_data_1k.df.json','rb'))
    contentType="application/json; format=pandas-records"
else:
    dataFile = './data/cc_data_1k.json'
    data = json.load(open('./data/cc_data_1k.json','rb'))
    contentType="application/json"

# set the headers
headers= {
    'Authorization': 'Bearer ' + TOKEN,
    'Content-Type': contentType
}

# submit the request via POST
response = requests.post(externalUrl, json=data, headers=headers)

# Only the first 300 characters will be displayed for brevity
printResponse = json.dumps(response.json())
print(printResponse[0:300])
[{"time": 1677788050393, "in": {"tensor": [-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748

Undeploy a Pipeline

Undeploys a deployed pipeline.

  • Parameters
    • pipeline_id - (REQUIRED int): The numerical id of the pipeline.
    • deployment_id - (REQUIRED int): The numerical id of the deployment.
  • Returns
    • Nothing if the call is successful.

Example: Both the empty pipeline and pipeline with models deployed in the step Deploy a Pipeline will be undeployed.

# Undeploy an empty pipeline
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/pipelines/undeploy"

data = {
    "pipeline_id": exampleEmptyPipelineId,
    "deployment_id":exampleEmptyDeploymentId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
# Undeploy pipeline with models
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/pipelines/undeploy"

data = {
    "pipeline_id": exampleModelPipelineId,
    "deployment_id":exampleModelDeploymentId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Copy a Pipeline

Copies an existing pipeline into a new one in the same workspace. A new engine configuration can be set for the copied pipeline.

  • Parameters
    • name - (REQUIRED string): The name of the new pipeline.
    • workspace_id - (REQUIRED int): The numerical id of the workspace to copy the source pipeline from.
    • source_pipeline - (REQUIRED int): The numerical id of the pipeline to copy from.
    • deploy - (OPTIONAL string): Name of the deployment.
    • engine_config - (OPTIONAL string): Engine configuration options.
    • pipeline_version - (OPTIONAL string): Optional version of the copied pipeline to create.

Example: The pipeline with models created in the step Create Pipeline in a Workspace will be copied into a new one.

# Copy a pipeline
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/pipelines/copy"

exampleCopiedPipelineName="copiedmodelpipeline"

data = {
  "name": exampleCopiedPipelineName,
  "workspace_id": exampleWorkspaceId,
  "source_pipeline": exampleModelPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'pipeline_pk_id': 26,
 'pipeline_variant_pk_id': 26,
 'pipeline_version': None,
 'deployment': None}

List Enablement Features

Lists the enablement features for the Wallaroo instance.

  • PARAMETERS
    • null: An empty set {}
  • RETURNS
    • features - (string): Enabled features.
    • name - (string): Name of the Wallaroo instance.
    • is_auth_enabled - (bool): Whether authentication is enabled.
# List enablement features
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
apiRequest = "/features/list"

data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'features': {'plateau': 'true'},
 'name': 'Wallaroo Dev',
 'is_auth_enabled': True}

Assays

IMPORTANT NOTE: These assays were run in a Wallaroo environment with canned historical data. See the Wallaroo Assay Tutorial for details on setting up this environment. This historical data is required for these examples.

Create Assay

Create a new array in a specified pipeline.

  • PARAMETERS
    • id - (OPTIONAL int): The numerical identifier for the assay.
    • name - (REQUIRED string): The name of the assay.
    • pipeline_id - (REQUIRED int): The numerical idenfifier the assay will be placed into.
    • pipeline_name - (REQUIRED string): The name of the pipeline
    • active - (REQUIRED bool): Indicates whether the assay will be active upon creation or not.
    • status - (REQUIRED string): The status of the assay upon creation.
    • iopath - (REQUIRED string): The iopath of the assay.
    • baseline - (REQUIRED baseline): The baseline for the assay.
      • Fixed - (REQUIRED AssayFixConfiguration): The fixed configuration for the assay.
        • pipeline - (REQUIRED string): The name of the pipeline with the baseline data.
        • model - (REQUIRED string): The name of the model used.
        • start_at - (REQUIRED string): The DateTime of the baseline start date.
        • end_at - (REQUIRED string): The DateTime of the baseline end date.
    • window (REQUIRED AssayWindow): Assay window.
      • pipeline - (REQUIRED string): The name of the pipeline for the assay window.
      • model - (REQUIRED string): The name of the model used for the assay window.
      • width - (REQUIRED string): The width of the assay window.
      • start - (OPTIONAL string): The DateTime of when to start the assay window.
      • interval - (OPTIONAL string): The assay window interval.
    • summarizer - (REQUIRED AssaySummerizer): The summarizer type for the array aka “advanced settings” in the Wallaroo Dashboard UI.
      • type - (REQUIRED string): Type of summarizer.
      • bin_mode - (REQUIRED string): The binning model type. Values can be:
        • Quantile
        • Equal
      • aggregation - (REQUIRED string): Aggregation type.
      • metric - (REQUIRED string): Metric type. Values can be:
        • PSI
        • Maximum Difference of Bins
        • Sum of the Difference of Bins
      • num_bins - (REQUIRED int): The number of bins. Recommanded values are between 5 and 14.
      • bin_weights - (OPTIONAL AssayBinWeight): The weights assigned to the assay bins.
      • bin_width - (OPTIONAL AssayBinWidth): The width assigned to the assay bins.
      • provided_edges - (OPTIONAL AssayProvidedEdges): The edges used for the assay bins.
      • add_outlier_edges - (REQUIRED bool): Indicates whether to add outlier edges or not.
    • warning_threshold - (OPTIONAL number): Optional warning threshold.
    • alert_threshold - (REQUIRED number): Alert threshold.
    • run_until - (OPTIONAL string): DateTime of when to end the assay.
    • workspace_id - (REQUIRED integer): The workspace the assay is part of.
    • model_insights_url - (OPTIONAL string): URL for model insights.
  • RETURNS
    • assay_id - (integer): The id of the new assay.

As noted this example requires the Wallaroo Assay Tutorial for historical data. Before running this example, set the sample pipeline id, pipeline, name, model name, and workspace id in the code sample below. For more information on retrieving this information, see the Wallaroo Developer Guides.

# Create assay

apiRequest = "/assays/create"

exampleAssayName = "api_assay_test2"

## Now get all of the assays for the pipeline in workspace 4 `housepricedrift`

exampleAssayPipelineId = 4
exampleAssayPipelineName = "housepricepipe"
exampleAssayModelName = "housepricemodel"
exampleAssayWorkspaceId = 4

# iopath can be input 00 or output 0 0
data = {
    'name': exampleAssayName,
    'pipeline_id': exampleAssayPipelineId,
    'pipeline_name': exampleAssayPipelineName,
    'active': True,
    'status': 'active',
    'iopath': "input 0 0",
    'baseline': {
        'Fixed': {
            'pipeline': exampleAssayPipelineName,
            'model': 'houseprice-model-yns',
            'start_at': '2022-01-01T00:00:00-05:00',
            'end_at': '2022-01-02T00:00:00-05:00'
        }
    },
    'window': {
        'pipeline': exampleAssayPipelineName,
        'model': exampleAssayModelName,
        'width': '24 hours',
        'start': None,
        'interval': None
    },
    'summarizer': {
        'type': 'UnivariateContinuous',
        'bin_mode': 'Quantile',
        'aggregation': 'Density',
        'metric': 'PSI',
        'num_bins': 5,
        'bin_weights': None,
        'bin_width': None,
        'provided_edges': None,
        'add_outlier_edges': True
    },
    'warning_threshold': 0,
    'alert_threshold': 0.1,
    'run_until': None,
    'workspace_id': exampleAssayWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
example_assay_id = response['assay_id']
response

List Assays

Lists all assays in the specified pipeline.

  • PARAMETERS
    • pipeline_id - (REQUIRED int): The numerical ID of the pipeline.
  • RETURNS
    • assays - (Array assays): A list of all assays.

Example: Display a list of all assays in a workspace. This will assume we have a workspace with an existing Assay and the associated data has been upload. See the tutorial Wallaroo Assays Tutorial.

For this reason, these values are hard coded for now.

## First list all of the workspaces and the list of pipelines

# List workspaces

apiRequest = "/workspaces/list"

data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
# Get assays

apiRequest = "/assays/list"

data = {
    "pipeline_id": exampleAssayPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Activate or Deactivate Assay

Activates or deactivates an existing assay.

  • Parameters
    • id - (REQUIRED int): The numerical id of the assay.
    • active - (REQUIRED bool): True to activate the assay, False to deactivate it.
  • Returns
      • id - (integer): The numerical id of the assay.
    • active - (bool): True to activate the assay, False to deactivate it.

Example: Assay 8 “House Output Assay” will be deactivated then activated.

# Deactivate assay

apiRequest = "/assays/set_active"

data = {
    'id': example_assay_id,
    'active': False
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
# Activate assay

apiRequest = "/assays/set_active"

data = {
    'id': example_assay_id,
    'active': True
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Create Interactive Baseline

Creates an interactive assay baseline.

  • PARAMETERS
    • id - (REQUIRED int): The numerical identifier for the assay.
    • name - (REQUIRED string): The name of the assay.
    • pipeline_id - (REQUIRED int): The numerical idenfifier the assay will be placed into.
    • pipeline_name - (REQUIRED string): The name of the pipeline
    • active - (REQUIRED bool): Indicates whether the assay will be active upon creation or not.
    • status - (REQUIRED string): The status of the assay upon creation.
    • iopath - (REQUIRED string): The iopath of the assay.
    • baseline - (REQUIRED baseline): The baseline for the assay.
      • Fixed - (REQUIRED AssayFixConfiguration): The fixed configuration for the assay.
        • pipeline - (REQUIRED string): The name of the pipeline with the baseline data.
        • model - (REQUIRED string): The name of the model used.
        • start_at - (REQUIRED string): The DateTime of the baseline start date.
        • end_at - (REQUIRED string): The DateTime of the baseline end date.
    • window (REQUIRED AssayWindow): Assay window.
      • pipeline - (REQUIRED string): The name of the pipeline for the assay window.
      • model - (REQUIRED string): The name of the model used for the assay window.
      • width - (REQUIRED string): The width of the assay window.
      • start - (OPTIONAL string): The DateTime of when to start the assay window.
      • interval - (OPTIONAL string): The assay window interval.
    • summarizer - (REQUIRED AssaySummerizer): The summarizer type for the array aka “advanced settings” in the Wallaroo Dashboard UI.
      • type - (REQUIRED string): Type of summarizer.
      • bin_mode - (REQUIRED string): The binning model type. Values can be:
        • Quantile
        • Equal
      • aggregation - (REQUIRED string): Aggregation type.
      • metric - (REQUIRED string): Metric type. Values can be:
        • PSI
        • Maximum Difference of Bins
        • Sum of the Difference of Bins
      • num_bins - (REQUIRED int): The number of bins. Recommanded values are between 5 and 14.
      • bin_weights - (OPTIONAL AssayBinWeight): The weights assigned to the assay bins.
      • bin_width - (OPTIONAL AssayBinWidth): The width assigned to the assay bins.
      • provided_edges - (OPTIONAL AssayProvidedEdges): The edges used for the assay bins.
      • add_outlier_edges - (REQUIRED bool): Indicates whether to add outlier edges or not.
    • warning_threshold - (OPTIONAL number): Optional warning threshold.
    • alert_threshold - (REQUIRED number): Alert threshold.
    • run_until - (OPTIONAL string): DateTime of when to end the assay.
    • workspace_id - (REQUIRED integer): The workspace the assay is part of.
    • model_insights_url - (OPTIONAL string): URL for model insights.
  • RETURNS
    • {} when successful.

Example: An interactive assay baseline will be set for the assay “Test Assay” on Pipeline 4.

# Run interactive baseline

apiRequest = "/assays/run_interactive_baseline"

exampleAssayPipelineId = 4
exampleAssayPipelineName = "housepricepipe"
exampleAssayModelName = "housepricemodel"
exampleAssayWorkspaceId = 4
exampleAssayId = 3
exampleAssayName = "example assay"

data = {
    'id': exampleAssayId,
    'name': exampleAssayName,
    'pipeline_id': exampleAssayPipelineId,
    'pipeline_name': exampleAssayPipelineName,
    'active': True,
    'status': 'active',
    'iopath': "input 0 0",
    'baseline': {
        'Fixed': {
            'pipeline': exampleAssayPipelineName,
            'model': exampleAssayModelName,
            'start_at': '2022-01-01T00:00:00-05:00',
            'end_at': '2022-01-02T00:00:00-05:00'
        }
    },
    'window': {
        'pipeline': exampleAssayPipelineName,
        'model': exampleAssayModelName,
        'width': '24 hours',
        'start': None,
        'interval': None
    },
    'summarizer': {
        'type': 'UnivariateContinuous',
        'bin_mode': 'Quantile',
        'aggregation': 'Density',
        'metric': 'PSI',
        'num_bins': 5,
        'bin_weights': None,
        'bin_width': None,
        'provided_edges': None,
        'add_outlier_edges': True
    },
    'warning_threshold': 0,
    'alert_threshold': 0.1,
    'run_until': None,
    'workspace_id': exampleAssayWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Get Assay Baseline

Retrieve an assay baseline.

  • Parameters
    • workspace_id - (REQUIRED integer): Numerical id for the workspace the assay is in.
    • pipeline_name - (REQUIRED string): Name of the pipeline the assay is in.
    • start - (OPTIONAL string): DateTime for when the baseline starts.
    • end - (OPTIONAL string): DateTime for when the baseline ends.
    • model_name - (OPTIONAL string): Name of the model.
    • limit - (OPTIONAL integer): Maximum number of baselines to return.
  • Returns
    • Assay Baseline

Example: 3 assay baselines for Workspace 6 and pipeline houseprice-pipe-yns will be retrieved.

# Get Assay Baseline

apiRequest = "/assays/get_baseline"

data = {
    'workspace_id': exampleAssayWorkspaceId,
    'pipeline_name': exampleAssayPipelineName,
    'limit': 3
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Run Assay Interactively

Runs an assay.

  • Parameters
    • id - (REQUIRED int): The numerical identifier for the assay.
    • name - (REQUIRED string): The name of the assay.
    • pipeline_id - (REQUIRED int): The numerical idenfifier the assay will be placed into.
    • pipeline_name - (REQUIRED string): The name of the pipeline
    • active - (REQUIRED bool): Indicates whether the assay will be active upon creation or not.
    • status - (REQUIRED string): The status of the assay upon creation.
    • iopath - (REQUIRED string): The iopath of the assay.
    • baseline - (REQUIRED baseline): The baseline for the assay.
      • Fixed - (REQUIRED AssayFixConfiguration): The fixed configuration for the assay.
        • pipeline - (REQUIRED string): The name of the pipeline with the baseline data.
        • model - (REQUIRED string): The name of the model used.
        • start_at - (REQUIRED string): The DateTime of the baseline start date.
        • end_at - (REQUIRED string): The DateTime of the baseline end date.
    • window (REQUIRED AssayWindow): Assay window.
      • pipeline - (REQUIRED string): The name of the pipeline for the assay window.
      • model - (REQUIRED string): The name of the model used for the assay window.
      • width - (REQUIRED string): The width of the assay window.
      • start - (OPTIONAL string): The DateTime of when to start the assay window.
      • interval - (OPTIONAL string): The assay window interval.
    • summarizer - (REQUIRED AssaySummerizer): The summarizer type for the array aka “advanced settings” in the Wallaroo Dashboard UI.
      • type - (REQUIRED string): Type of summarizer.
      • bin_mode - (REQUIRED string): The binning model type. Values can be:
        • Quantile
        • Equal
      • aggregation - (REQUIRED string): Aggregation type.
      • metric - (REQUIRED string): Metric type. Values can be:
        • PSI
        • Maximum Difference of Bins
        • Sum of the Difference of Bins
      • num_bins - (REQUIRED int): The number of bins. Recommanded values are between 5 and 14.
      • bin_weights - (OPTIONAL AssayBinWeight): The weights assigned to the assay bins.
      • bin_width - (OPTIONAL AssayBinWidth): The width assigned to the assay bins.
      • provided_edges - (OPTIONAL AssayProvidedEdges): The edges used for the assay bins.
      • add_outlier_edges - (REQUIRED bool): Indicates whether to add outlier edges or not.
    • warning_threshold - (OPTIONAL number): Optional warning threshold.
    • alert_threshold - (REQUIRED number): Alert threshold.
    • run_until - (OPTIONAL string): DateTime of when to end the assay.
    • workspace_id - (REQUIRED integer): The workspace the assay is part of.
    • model_insights_url - (OPTIONAL string): URL for model insights.
  • Returns
    • Assay

Example: An interactive assay will be run for Assay exampleAssayId exampleAssayName. Depending on the number of assay results and the data window, this may take some time. This returns all of the results for this assay at this time. The total number of responses will be displayed after.

# Run interactive assay

apiRequest = "/assays/run_interactive"

data = {
    'id': exampleAssayId,
    'name': exampleAssayName,
    'pipeline_id': exampleAssayPipelineId,
    'pipeline_name': exampleAssayPipelineName,
    'active': True,
    'status': 'active',
    'iopath': "input 0 0",
    'baseline': {
        'Fixed': {
            'pipeline': exampleAssayPipelineName,
            'model': exampleAssayModelName,
            'start_at': '2022-01-01T00:00:00-05:00',
            'end_at': '2022-01-02T00:00:00-05:00'
        }
    },
    'window': {
        'pipeline': exampleAssayPipelineName,
        'model': exampleAssayModelName,
        'width': '24 hours',
        'start': None,
        'interval': None
    },
    'summarizer': {
        'type': 'UnivariateContinuous',
        'bin_mode': 'Quantile',
        'aggregation': 'Density',
        'metric': 'PSI',
        'num_bins': 5,
        'bin_weights': None,
        'bin_width': None,
        'provided_edges': None,
        'add_outlier_edges': True
    },
    'warning_threshold': 0,
    'alert_threshold': 0.1,
    'run_until': None,
    'workspace_id': exampleAssayWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response[0]
print(len(response))

Get Assay Results

Retrieve the results for an assay.

  • Parameters
    • assay_id - (REQUIRED integer): Numerical id for the assay.
    • start - (OPTIONAL string): DateTime for when the baseline starts.
    • end - (OPTIONAL string): DateTime for when the baseline ends.
    • limit - (OPTIONAL integer): Maximum number of results to return.
    • pipeline_id - (OPTIONAL integer): Numerical id of the pipeline the assay is in.
  • Returns
    • Assay Baseline

Example: Results for Assay 3 “example assay” will be retrieved for January 2 to January 3. For the sake of time, only the first record will be displayed.

# Get Assay Results

apiRequest = "/assays/get_results"

data = {
    'assay_id': exampleAssayId,
    'pipeline_id': exampleAssayPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

1.1.1 - Wallaroo MLOps API Essentials Guide: User Management

How to use the Wallaroo API for User Management

Users

Users can be created, activated, and deactivated through the Wallaroo MLOps API.

Get Users

Users can be retrieved either by their Keycloak user id, or return all users if an empty set {} is submitted.

  • Parameters
    • {}: Empty set, returns all users.
    • user_ids Array[Keycloak user ids]: An array of Keycloak user ids, typically in UUID format.

Example: The first example will submit an empty set {} to return all users, then submit the first user’s user id and request only that user’s details.

# Get all users

apiRequest = "/users/query"
data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'users': {'5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f': {'access': {'impersonate': True,
    'manageGroupMembership': True,
    'manage': True,
    'mapRoles': True,
    'view': True},
   'createdTimestamp': 1669221287375,
   'disableableCredentialTypes': [],
   'email': 'john.hansarick@wallaroo.ai',
   'emailVerified': True,
   'enabled': True,
   'id': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'notBefore': 0,
   'requiredActions': [],
   'username': 'john.hansarick@wallaroo.ai'},
  '941937b3-7dc8-4abe-8bb1-bd23c816421e': {'access': {'view': True,
    'manage': True,
    'manageGroupMembership': True,
    'mapRoles': True,
    'impersonate': True},
   'createdTimestamp': 1669221214282,
   'disableableCredentialTypes': [],
   'emailVerified': False,
   'enabled': True,
   'id': '941937b3-7dc8-4abe-8bb1-bd23c816421e',
   'notBefore': 0,
   'requiredActions': [],
   'username': 'admin'},
  'da7c2f4c-822e-49eb-93d7-a4b90af9b4ca': {'access': {'mapRoles': True,
    'impersonate': True,
    'manage': True,
    'manageGroupMembership': True,
    'view': True},
   'createdTimestamp': 1669654086172,
   'disableableCredentialTypes': [],
   'email': 'kilvin.mitchell@wallaroo.ai',
   'emailVerified': True,
   'enabled': True,
   'id': 'da7c2f4c-822e-49eb-93d7-a4b90af9b4ca',
   'notBefore': 0,
   'requiredActions': [],
   'username': 'kilvin.mitchell@wallaroo.ai'}}}
# Get first user Keycloak id
firstUserKeycloak = list(response['users'])[0]

apiRequest = "/users/query"
data = {
  "user_ids": [
    firstUserKeycloak
  ]
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'users': {'5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f': {'access': {'view': True,
    'manage': True,
    'manageGroupMembership': True,
    'mapRoles': True,
    'impersonate': True},
   'createdTimestamp': 1669221287375,
   'disableableCredentialTypes': [],
   'email': 'john.hansarick@wallaroo.ai',
   'emailVerified': True,
   'enabled': True,
   'id': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'notBefore': 0,
   'requiredActions': [],
   'username': 'john.hansarick@wallaroo.ai'}}}

Invite Users

IMPORTANT NOTE: This command is for Wallaroo Community only. For more details on user management, see Wallaroo User Management.

Users can be invited through /users/invite. When using Wallaroo Community, this will send an invitation email to the email address listed. Note that the user must not already be a member of the Wallaroo instance, and email addresses must be unique. If the email address is already in use for another user, the request will generate an error.

  • Parameters
    • email *(REQUIRED string): The email address of the new user to invite.
    • password (OPTIONAL string): The assigned password of the new user to invite. If not provided, the Wallaroo instance will provide the new user a temporary password that must be changed upon initial login.

Example: In this example, a new user will be invited to the Wallaroo instance and assigned a password.

# invite users
apiRequest = "/users/invite"
data = {
    "email": newUser,
    "password":newPassword
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Deactivate User

Users can be deactivated so they can not login to their Wallaroo instance. Deactivated users do not count against the Wallaroo license count.

  • Parameters
    • email (REQUIRED string): The email address of the user to deactivate.

Example: In this example, the newUser will be deactivated.

# Deactivate users

apiRequest = "/users/deactivate"

data = {
    "email": newUser
}
response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{}

Activate User

A deactivated user can be reactivated to allow them access to their Wallaroo instance. Activated users count against the Wallaroo license count.

  • Parameters
    • email (REQUIRED string): The email address of the user to activate.

Example: In this example, the newUser will be activated.

# Activate users

apiRequest = "/users/activate"

data = {
    "email": newUser
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{}

1.1.2 - Wallaroo MLOps API Essentials Guide: Workspace Management

How to use the Wallaroo API for Workspace Management

Workspaces

Workspaces can be created and managed through the Wallaroo MLOps API.

List Workspaces

List the workspaces for a specific user.

  • Parameters
    • user_id - (OPTIONAL string): The Keycloak ID.

Example: In this example, the workspaces for the a specific user will be displayed, then workspaces for all users will be displayed.

# List workspaces by user id

apiRequest = "/workspaces/list"

data = {
    "user_id":firstUserKeycloak
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'workspaces': [{'id': 1,
   'name': 'john.hansarick@wallaroo.ai - Default Workspace',
   'created_at': '2022-11-23T16:34:47.914362+00:00',
   'created_by': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'archived': False,
   'models': [],
   'pipelines': []},
  {'id': 2,
   'name': 'alohaworkspace',
   'created_at': '2022-11-23T16:44:28.782225+00:00',
   'created_by': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'archived': False,
   'models': [1],
   'pipelines': [1]},
  {'id': 4,
   'name': 'testapiworkspace-cdf86c3c-8c9a-4bf4-865d-fe0ec00fad7c',
   'created_at': '2022-11-28T16:48:29.622794+00:00',
   'created_by': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'archived': False,
   'models': [2, 3, 5, 4, 6, 7, 8, 9],
   'pipelines': []}]}
# List workspaces

apiRequest = "/workspaces/list"

data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'workspaces': [{'id': 1,
   'name': 'john.hansarick@wallaroo.ai - Default Workspace',
   'created_at': '2022-11-23T16:34:47.914362+00:00',
   'created_by': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'archived': False,
   'models': [],
   'pipelines': []},
  {'id': 2,
   'name': 'alohaworkspace',
   'created_at': '2022-11-23T16:44:28.782225+00:00',
   'created_by': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'archived': False,
   'models': [1],
   'pipelines': [1]},
  {'id': 4,
   'name': 'testapiworkspace-cdf86c3c-8c9a-4bf4-865d-fe0ec00fad7c',
   'created_at': '2022-11-28T16:48:29.622794+00:00',
   'created_by': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'archived': False,
   'models': [2, 3, 5, 4, 6, 7, 8, 9],
   'pipelines': []}]}

Create Workspace

A new workspace will be created in the Wallaroo instance. Upon creating, the workspace owner will be assigned as the user making the MLOps API request.

  • Parameters:
    • workspace_name - (REQUIRED string): The name of the new workspace.
  • Returns:
    • workspace_id - (int): The ID of the new workspace.

Example: In this example, a workspace with the name testapiworkspace- with a randomly generated UUID will be created, and the newly created workspace’s workspace_id saved as the variable exampleWorkspaceId for use in other code examples. After the request is complete, the List Workspaces command will be issued to demonstrate the new workspace has been created.

# Create workspace

apiRequest = "/workspaces/create"

exampleWorkspaceName = f"testapiworkspace-{uuid.uuid4()}"
data = {
  "workspace_name": exampleWorkspaceName
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
# Stored for future examples
exampleWorkspaceId = response['workspace_id']
response
{'workspace_id': 618489}
# List workspaces

apiRequest = "/workspaces/list"

data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'workspaces': [{'id': 15,
   'name': 'john.hansarick@wallaroo.ai - Default Workspace',
   'created_at': '2022-12-16T20:23:23.150058+00:00',
   'created_by': '01a797f9-1357-4506-a4d2-8ab9c4681103',
   'archived': False,
   'models': [],
   'pipelines': []},
  {'id': 28,
   'name': 'alohaworkspace',
   'created_at': '2022-12-16T21:00:01.614796+00:00',
   'created_by': '01a797f9-1357-4506-a4d2-8ab9c4681103',
   'archived': False,
   'models': [2],
   'pipelines': [4]},
  {'id': 29,
   'name': 'abtestworkspace',
   'created_at': '2022-12-16T21:03:08.785538+00:00',
   'created_by': '01a797f9-1357-4506-a4d2-8ab9c4681103',
   'archived': False,
   'models': [3, 5, 4, 6],
   'pipelines': [6]},
  {'id': 618487,
   'name': 'sdkquickworkspace',
   'created_at': '2022-12-20T15:56:22.088161+00:00',
   'created_by': '01a797f9-1357-4506-a4d2-8ab9c4681103',
   'archived': False,
   'models': [48],
   'pipelines': [76]},
  {'id': 618489,
   'name': 'testapiworkspace-e9e386a7-8146-4ead-b4c6-a2580af70083',
   'created_at': '2022-12-20T19:34:30.392835+00:00',
   'created_by': '01a797f9-1357-4506-a4d2-8ab9c4681103',
   'archived': False,
   'models': [],
   'pipelines': []}]}

Add User to Workspace

Existing users of the Wallaroo instance can be added to an existing workspace.

  • Parameters
    • email - (REQUIRED string): The email address of the user to add to the workspace.
    • workspace_id - (REQUIRED int): The id of the workspace.

Example: The following example adds the user created in Invite Users request to the workspace created in the Create Workspace request.

# Add existing user to existing workspace

apiRequest = "/workspaces/add_user"

data = {
  "email":newUser,
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{}

List Users in a Workspace

Lists the users who are either owners or collaborators of a workspace.

  • Parameters
    • workspace_id - (REQUIRED int): The id of the workspace.
  • Returns
    • user_id: The user’s identification.
    • user_type: The user’s workspace type (owner, co-owner, etc).

Example: The following example will list all users part of the workspace created in the Create Workspace request.

# List users in a workspace

apiRequest = "/workspaces/list_users"

data = {
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'users': [{'user_id': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'user_type': 'OWNER'},
  {'user_id': 'da7c2f4c-822e-49eb-93d7-a4b90af9b4ca',
   'user_type': 'COLLABORATOR'}]}

Remove User from a Workspace

Removes the user from the given workspace. In this request, either the user’s Keycloak ID is required OR the user’s email address is required.

  • Parameters
    • workspace_id - (REQUIRED int): The id of the workspace.
    • user_id - (string): The Keycloak ID of the user. If email is not provided, then this parameter is REQUIRED.
    • email - (string): The user’s email address. If user_id is not provided, then this parameter is REQUIRED.
  • Returns
    • user_id: The user’s identification.
    • user_type: The user’s workspace type (owner, co-owner, etc).

Example: The following example will remove the newUser from workspace created in the Create Workspace request. Then the users for that workspace will be listed to verify newUser has been removed.

# Remove existing user from an existing workspace

apiRequest = "/workspaces/remove_user"

data = {
  "email":newUser,
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'affected_rows': 1}
# List users in a workspace

apiRequest = "/workspaces/list_users"

data = {
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'users': [{'user_id': '5e9c9a2b-7a7f-454a-b8e7-91e3c2d86c9f',
   'user_type': 'OWNER'}]}

1.1.3 - Wallaroo MLOps API Essentials Guide: Model Management

How to use the Wallaroo API for Model Management

Models

Models can be uploaded and managed through the Wallaroo API.

Upload Model to Workspace

Uploads a ML Model to a Wallaroo workspace via POST with Content-Type: multipart/form-data.

  • Parameters
    • name - (REQUIRED string): Name of the model
    • visibility - (OPTIONAL string): The visibility of the model as either public or private.
    • workspace_id - (REQUIRED int): The numerical id of the workspace to upload the model to.

Example: This example will upload the sample file ccfraud.onnx to the workspace created in the Create Workspace step as apitestmodel. The model name will be saved as exampleModelName for use in other examples. The id of the uploaded model will be saved as exampleModelId for use in later examples.

# upload model - uses multiform data through a Python `request`

apiRequest = "/models/upload"

exampleModelName = f"apitestmodel-{uuid.uuid4()}"

data = {
    "name":exampleModelName,
    "visibility":"public",
    "workspace_id": exampleWorkspaceId
}

files = {
    "file": ('ccfraud.onnx', open('./models/ccfraud.onnx','rb'))
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data, files)
response
{'insert_models': {'returning': [{'models': [{'id': 68}]}]}}
exampleModelId=response['insert_models']['returning'][0]['models'][0]['id']
exampleModelId
10

Stream Upload Model to Workspace

Streams a potentially large ML Model to a Wallaroo workspace via POST with Content-Type: multipart/form-data.

  • Parameters
    • name - (REQUIRED string): Name of the model
    • filename - (REQUIRED string): Name of the file being uploaded.
    • visibility - (OPTIONAL string): The visibility of the model as either public or private.
    • workspace_id - (REQUIRED int): The numerical id of the workspace to upload the model to.

Example: This example will upload the sample file ccfraud.onnx to the workspace created in the Create Workspace step as apitestmodel.

# stream upload model - next test is adding arbitrary chunks to the stream

apiRequest = "/models/upload_stream"
exampleModelName = f"apitestmodel-{uuid.uuid4()}"
filename = 'streamfile.onnx'

data = {
    "name":exampleModelName,
    "filename": 'streamfile.onnx',
    "visibility":"public",
    "workspace_id": exampleWorkspaceId
}

contentType='application/octet-stream'

file = open('./models/ccfraud.onnx','rb')

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data=None, files=file, contentType='application/octet-stream', params=data)
response
{'insert_models': {'returning': [{'models': [{'id': 11}]}]}}

List Models in Workspace

Returns a list of models added to a specific workspace.

  • Parameters
    • workspace_id - (REQUIRED int): The workspace id to list.

Example: Display the models for the workspace used in the Upload Model to Workspace step. The model id and model name will be saved as exampleModelId and exampleModelName variables for other examples.

# List models in a workspace

apiRequest = "/models/list"

data = {
  "workspace_id": exampleWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'models': [{'id': 68,
   'name': 'apitestmodel-dfa7e9fd-df72-4b28-93f6-3d147c9f962f',
   'owner_id': '""',
   'created_at': '2022-12-20T19:34:47.014072+00:00',
   'updated_at': '2022-12-20T19:34:47.014072+00:00'}]}
exampleModelId = response['models'][0]['id']
exampleModelName = response['models'][0]['name']

Get Model Details by ID

Returns the model details by the specific model id.

  • Parameters
    • workspace_id - (REQUIRED int): The workspace id to list.
  • Returns
    • id - (int): Numerical id of the model.
    • owner_id - (string): Id of the owner of the model.
    • workspace_id - (int): Numerical of the id the model is in.
    • name - (string): Name of the model.
    • updated_at - (DateTime): Date and time of the model’s last update.
    • created_at - (DateTime): Date and time of the model’s creation.
    • model_config - (string): Details of the model’s configuration.

Example: Retrieve the details for the model uploaded in the Upload Model to Workspace step.

# Get model details by id

apiRequest = "/models/get_by_id"

data = {
  "id": exampleModelId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'id': 11,
 'owner_id': '""',
 'workspace_id': 5,
 'name': 'apitestmodel-08b16b74-837c-4b4b-a6b7-49475ddece98',
 'updated_at': '2022-11-28T21:30:05.071826+00:00',
 'created_at': '2022-11-28T21:30:05.071826+00:00',
 'model_config': None}

Get Model Versions

Retrieves all versions of a model based on either the name of the model or the model_pk_id.

  • Parameters
    • model_id - (REQUIRED String): The model name.
    • models_pk_id - (REQUIRED int): The model integer pk id.
  • Returns
    • Array(Model Details)
      • sha - (String): The sha hash of the model version.
      • models_pk_id- (int): The pk id of the model.
      • model_version - (String): The UUID identifier of the model version.
      • owner_id - (String): The Keycloak user id of the model’s owner.
      • model_id - (String): The name of the model.
      • id - (int): The integer id of the model.
      • file_name - (String): The filename used when uploading the model.
      • image_path - (String): The image path of the model.

Example: Retrieve the versions for a previously uploaded model. The variables exampleModelVersion and exampleModelSha will store the model’s version and SHA values for use in other examples.

# List models in a workspace

apiRequest = "/models/list_versions"

data = {
  "model_id": exampleModelName,
  "models_pk_id": exampleModelId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
[{'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
  'models_pk_id': 68,
  'model_version': '87476d85-b8ee-4714-81ba-53041b26f50f',
  'owner_id': '""',
  'model_id': 'apitestmodel-dfa7e9fd-df72-4b28-93f6-3d147c9f962f',
  'id': 68,
  'file_name': 'ccfraud.onnx',
  'image_path': None}]
# Stored for future examples

exampleModelVersion = response[0]['model_version']
exampleModelSha = response[0]['sha']

Get Model Configuration by Id

Returns the model’s configuration details.

  • Parameters
    • model_id - (REQUIRED int): The numerical value of the model’s id.

Example: Submit the model id for the model uploaded in the Upload Model to Workspace step to retrieve configuration details.

# Get model config by id

apiRequest = "/models/get_config_by_id"

data = {
  "model_id": exampleModelId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'model_config': None}

Get Model Details

Returns details regarding a single model, including versions.

Returns the model’s configuration details.

  • Parameters
    • model_id - (REQUIRED int): The numerical value of the model’s id.

Example: Submit the model id for the model uploaded in the Upload Model to Workspace step to retrieve configuration details.

# Get model config by id

apiRequest = "/models/get"

data = {
  "id": exampleModelId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'id': 11,
 'name': 'apitestmodel-08b16b74-837c-4b4b-a6b7-49475ddece98',
 'owner_id': '""',
 'created_at': '2022-11-28T21:30:05.071826+00:00',
 'updated_at': '2022-11-28T21:30:05.071826+00:00',
 'models': [{'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
   'models_pk_id': 11,
   'model_version': '2589d7e4-bc61-4cb7-8278-5399f28ad001',
   'owner_id': '""',
   'model_id': 'apitestmodel-08b16b74-837c-4b4b-a6b7-49475ddece98',
   'id': 11,
   'file_name': 'streamfile.onnx',
   'image_path': None}]}

1.1.4 - Wallaroo MLOps API Essentials Guide: Pipeline Management

How to use the Wallaroo API for Pipeline Management

Pipeline Management

Pipelines can be managed through the Wallaroo API. Pipelines are the vehicle used for deploying, serving, and monitoring ML models. For more information, see the Wallaroo Glossary.

Create Pipeline in a Workspace

Creates a new pipeline in the specified workspace.

  • Parameters
    • pipeline_id - (REQUIRED string): Name of the new pipeline.
    • workspace_id - (REQUIRED int): Numerical id of the workspace for the new pipeline.
    • definition - (REQUIRED string): Pipeline definitions, can be {} for none.

Example: Two pipelines are created in the workspace created in the step Create Workspace. One will be an empty pipeline without any models, the other will be created using the uploaded models in the Upload Model to Workspace step and no configuration details. The pipeline id, variant id, and variant version of each pipeline will be stored for later examples.

# Create pipeline in a workspace

apiRequest = "/pipelines/create"

exampleEmptyPipelineName=f"emptypipeline-{uuid.uuid4()}"

data = {
  "pipeline_id": exampleEmptyPipelineName,
  "workspace_id": exampleWorkspaceId,
  "definition": {}
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
exampleEmptyPipelineId = response['pipeline_pk_id']
exampleEmptyPipelineVariantId=response['pipeline_variant_pk_id']
emptyExamplePipelineVariantVersion=['pipeline_variant_version']
response
{'pipeline_pk_id': 3,
 'pipeline_variant_pk_id': 3,
 'pipeline_variant_version': '84730f78-7b89-4420-bdcb-3c5abac0dd10'}
# Create pipeline in a workspace with models

apiRequest = "/pipelines/create"

exampleModelPipelineName=f"pipelinewithmodel-{uuid.uuid4()}"
exampleModelDeployName = f"deploywithmodel-{uuid.uuid4()}"

data = {
  "pipeline_id": exampleModelPipelineName,
  "workspace_id": exampleWorkspaceId,
  "definition": {
      "id":exampleModelDeployName,
      "steps":
      [
          {
          "ModelInference":
          {
              "models": [
                    {
                        "name":exampleModelName,
                        "version":exampleModelVersion,
                        "sha":exampleModelSha
                    }
                ]
          }
          }
      ]
  }
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
exampleModelPipelineId = response['pipeline_pk_id']
exampleModelPipelineVariantId=response['pipeline_variant_pk_id']
emptyModelPipelineVariantVersion=['pipeline_variant_version']
response
{'pipeline_pk_id': 108,
 'pipeline_variant_pk_id': 108,
 'pipeline_variant_version': '246685fd-258a-49a6-b431-ae4847890eee'}

Deploy a Pipeline

Deploy a an existing pipeline. Note that for any pipeline that has model steps, they must be included either in model_configs, model_ids or models.

  • Parameters
    • deploy_id (REQUIRED string): The name for the pipeline deployment.
    • engine_config (OPTIONAL string): Additional configuration options for the pipeline.
    • pipeline_version_pk_id (REQUIRED int): Pipeline version id.
    • model_configs (OPTIONALArray int): Ids of model configs to apply.
    • model_ids (OPTIONALArray int): Ids of models to apply to the pipeline. If passed in, model_configs will be created automatically.
    • models (OPTIONAL Array models): If the model ids are not available as a pipeline step, the models’ data can be passed to it through this method. The options below are only required if models are provided as a parameter.
      • name (REQUIRED string): Name of the uploaded model that is in the same workspace as the pipeline.
      • version (REQUIRED string): Version of the model to use.
      • sha (REQUIRED string): SHA value of the model.
    • pipeline_id (REQUIRED int): Numerical value of the pipeline to deploy.
  • Returns
    • id (int): The deployment id.

Examples: Both the empty pipeline and pipeline with model created in the step Create Pipeline in a Workspace will be deployed and their deployment information saved for later examples.

# Deploy empty pipeline

apiRequest = "/pipelines/deploy"

exampleEmptyDeployId = f"emptydeploy-{uuid.uuid4()}"

data = {
    "deploy_id": exampleEmptyDeployId,
    "pipeline_version_pk_id": exampleEmptyPipelineVariantId,
    "pipeline_id": exampleEmptyPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
exampleEmptyDeploymentId=response['id']
response
{'id': 2}
# Deploy a pipeline with models

apiRequest = "/pipelines/deploy"
exampleModelDeployId=f"modeldeploy-{uuid.uuid4()}"

data = {
    "deploy_id": exampleModelDeployId,
    "pipeline_version_pk_id": exampleModelPipelineVariantId,
    "models": [
        {
            "name":exampleModelName,
            "version":exampleModelVersion,
            "sha":exampleModelSha
        }
    ],
    "pipeline_id": exampleModelPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
exampleModelDeploymentId=response['id']
response
{'id': 60}

Get Deployment Status

Returns the deployment status.

  • Parameters
    • name - (REQUIRED string): The deployment in the format {deployment_name}-{deploymnent-id}.

Example: The deployed empty and model pipelines status will be displayed.

# Get empty pipeline deployment

apiRequest = "/status/get_deployment"

data = {
  "name": f"{exampleEmptyDeployId}-{exampleEmptyDeploymentId}"
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
# Get model pipeline deployment

apiRequest = "/status/get_deployment"

data = {
  "name": f"{exampleModelDeployId}-{exampleModelDeploymentId}"
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.4.1.151',
   'name': 'engine-577db84597-x7bm4',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'pipelinewithmodel-94676967-b018-4002-89ef-1d69defc6273',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'apitestmodel-dfa7e9fd-df72-4b28-93f6-3d147c9f962f',
      'version': '87476d85-b8ee-4714-81ba-53041b26f50f',
      'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.4.2.59',
   'name': 'engine-lb-7d6f4bfdd-6hxj5',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Get External Inference URL

The API command /admin/get_pipeline_external_url retrieves the external inference URL for a specific pipeline in a workspace.

  • Parameters
    • workspace_id (REQUIRED integer): The workspace integer id.
    • pipeline_name (REQUIRED string): The name of the deployment.

In this example, a list of the workspaces will be retrieved. Based on the setup from the Internal Pipeline Deployment URL Tutorial, the workspace matching urlworkspace will have it’s workspace id stored and used for the /admin/get_pipeline_external_url request with the pipeline urlpipeline.

The External Inference URL will be stored as a variable for the next step.

## Retrieve the pipeline's External Inference URL
TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)

apiRequest = "/admin/get_pipeline_external_url"

data = {
    "workspace_id": exampleWorkspaceId,
    "pipeline_name": exampleModelDeployId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
print(response)
externalUrl = response['url']
externalUrl
{'url': 'https://doc-test.api.example.com/v1/api/pipelines/infer/modeldeploy-15'}

https://doc-test.api.example.com/v1/api/pipelines/infer/modeldeploy-15'

Perform Inference Through External URL

The inference can now be performed through the External Inference URL. This URL will accept the same inference data file that is used with the Wallaroo SDK, or with an Internal Inference URL as used in the Internal Pipeline Inference URL Tutorial.

For this example, the externalUrl retrieved through the Get External Inference URL is used to submit a single inference request.

If the Wallaroo instance has been configured to enable Arrow support, then the file cc_data_1k.df.json will be used. This is a DataFrame object. If Arrow support has not been enabled, then the inference request is used with the Wallaroo proprietary JSON data file cc_data_1k.json.

#TOKEN=get_jwt_token(TOKENURL, CLIENT, SECRET, USERNAME, PASSWORD)
# if Arrow has been enabled, set this to True.  Otherwise, leave as False.
arrowEnabled = True
## Inference through external URL

# retrieve the json data to submit
if arrowEnabled is True:
    dataFile = './data/cc_data_1k.df.json'
    data = json.load(open('./data/cc_data_1k.df.json','rb'))
    contentType="application/json; format=pandas-records"
else:
    dataFile = './data/cc_data_1k.json'
    data = json.load(open('./data/cc_data_1k.json','rb'))
    contentType="application/json"

# set the headers
headers= {
    'Authorization': 'Bearer ' + TOKEN,
    'Content-Type': contentType
}

# submit the request via POST
response = requests.post(externalUrl, json=data, headers=headers)

# Only the first 300 characters will be displayed for brevity
printResponse = json.dumps(response.json())
print(printResponse[0:300])
[{"time": 1677788050393, "in": {"tensor": [-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748

Undeploy a Pipeline

Undeploys a deployed pipeline.

  • Parameters
    • pipeline_id - (REQUIRED int): The numerical id of the pipeline.
    • deployment_id - (REQUIRED int): The numerical id of the deployment.
  • Returns
    • Nothing if the call is successful.

Example: Both the empty pipeline and pipeline with models deployed in the step Deploy a Pipeline will be undeployed.

# Undeploy an empty pipeline

apiRequest = "/pipelines/undeploy"

data = {
    "pipeline_id": exampleEmptyPipelineId,
    "deployment_id":exampleEmptyDeploymentId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
# Undeploy pipeline with models

apiRequest = "/pipelines/undeploy"

data = {
    "pipeline_id": exampleModelPipelineId,
    "deployment_id":exampleModelDeploymentId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

Copy a Pipeline

Copies an existing pipeline into a new one in the same workspace. A new engine configuration can be set for the copied pipeline.

  • Parameters
    • name - (REQUIRED string): The name of the new pipeline.
    • workspace_id - (REQUIRED int): The numerical id of the workspace to copy the source pipeline from.
    • source_pipeline - (REQUIRED int): The numerical id of the pipeline to copy from.
    • deploy - (OPTIONAL string): Name of the deployment.
    • engine_config - (OPTIONAL string): Engine configuration options.
    • pipeline_version - (OPTIONAL string): Optional version of the copied pipeline to create.

Example: The pipeline with models created in the step Create Pipeline in a Workspace will be copied into a new one.

# Copy a pipeline

apiRequest = "/pipelines/copy"

exampleCopiedPipelineName=f"copiedmodelpipeline-{uuid.uuid4()}"

data = {
  "name": exampleCopiedPipelineName,
  "workspace_id": exampleWorkspaceId,
  "source_pipeline": exampleModelPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'pipeline_pk_id': 5,
 'pipeline_variant_pk_id': 5,
 'pipeline_version': None,
 'deployment': None}

1.1.5 - Wallaroo MLOps API Essentials Guide: Enablement Management

How to use the Wallaroo API for Enablement Management

Enablement Management

Enablement Management allows users to see what Wallaroo features have been activated.

List Enablement Features

Lists the enablement features for the Wallaroo instance.

  • PARAMETERS
    • null: An empty set {}
  • RETURNS
    • features - (string): Enabled features.
    • name - (string): Name of the Wallaroo instance.
    • is_auth_enabled - (bool): Whether authentication is enabled.
# List enablement features

apiRequest = "/features/list"

data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'features': {'plateau': 'true'},
 'name': 'Wallaroo Dev',
 'is_auth_enabled': True}

1.1.6 - Wallaroo MLOps API Essentials Guide: Assays Management

How to use the Wallaroo API for Assays Management

Assays

IMPORTANT NOTE: These assays were run in a Wallaroo environment with canned historical data. See the Wallaroo Assay Tutorial for details on setting up this environment. This historical data is required for these examples.

Create Assay

Create a new array in a specified pipeline.

  • PARAMETERS
    • id - (OPTIONAL int): The numerical identifier for the assay.
    • name - (REQUIRED string): The name of the assay.
    • pipeline_id - (REQUIRED int): The numerical idenfifier the assay will be placed into.
    • pipeline_name - (REQUIRED string): The name of the pipeline
    • active - (REQUIRED bool): Indicates whether the assay will be active upon creation or not.
    • status - (REQUIRED string): The status of the assay upon creation.
    • iopath - (REQUIRED string): The iopath of the assay.
    • baseline - (REQUIRED baseline): The baseline for the assay.
      • Fixed - (REQUIRED AssayFixConfiguration): The fixed configuration for the assay.
        • pipeline - (REQUIRED string): The name of the pipeline with the baseline data.
        • model - (REQUIRED string): The name of the model used.
        • start_at - (REQUIRED string): The DateTime of the baseline start date.
        • end_at - (REQUIRED string): The DateTime of the baseline end date.
    • window (REQUIRED AssayWindow): Assay window.
      • pipeline - (REQUIRED string): The name of the pipeline for the assay window.
      • model - (REQUIRED string): The name of the model used for the assay window.
      • width - (REQUIRED string): The width of the assay window.
      • start - (OPTIONAL string): The DateTime of when to start the assay window.
      • interval - (OPTIONAL string): The assay window interval.
    • summarizer - (REQUIRED AssaySummerizer): The summarizer type for the array aka “advanced settings” in the Wallaroo Dashboard UI.
      • type - (REQUIRED string): Type of summarizer.
      • bin_mode - (REQUIRED string): The binning model type. Values can be:
        • Quantile
        • Equal
      • aggregation - (REQUIRED string): Aggregation type.
      • metric - (REQUIRED string): Metric type. Values can be:
        • PSI
        • Maximum Difference of Bins
        • Sum of the Difference of Bins
      • num_bins - (REQUIRED int): The number of bins. Recommanded values are between 5 and 14.
      • bin_weights - (OPTIONAL AssayBinWeight): The weights assigned to the assay bins.
      • bin_width - (OPTIONAL AssayBinWidth): The width assigned to the assay bins.
      • provided_edges - (OPTIONAL AssayProvidedEdges): The edges used for the assay bins.
      • add_outlier_edges - (REQUIRED bool): Indicates whether to add outlier edges or not.
    • warning_threshold - (OPTIONAL number): Optional warning threshold.
    • alert_threshold - (REQUIRED number): Alert threshold.
    • run_until - (OPTIONAL string): DateTime of when to end the assay.
    • workspace_id - (REQUIRED integer): The workspace the assay is part of.
    • model_insights_url - (OPTIONAL string): URL for model insights.
  • RETURNS
    • assay_id - (integer): The id of the new assay.

As noted this example requires the Wallaroo Assay Tutorial for historical data. Before running this example, set the sample pipeline id, pipeline, name, model name, and workspace id in the code sample below. For more information on retrieving this information, see the Wallaroo Developer Guides.

# Create assay

apiRequest = "/assays/create"

exampleAssayName = "api_assay_test2"

## Now get all of the assays for the pipeline in workspace 4 `housepricedrift`

exampleAssayPipelineId = 4
exampleAssayPipelineName = "housepricepipe"
exampleAssayModelName = "housepricemodel"
exampleAssayWorkspaceId = 4

# iopath can be input 00 or output 0 0
data = {
    'name': exampleAssayName,
    'pipeline_id': exampleAssayPipelineId,
    'pipeline_name': exampleAssayPipelineName,
    'active': True,
    'status': 'active',
    'iopath': "input 0 0",
    'baseline': {
        'Fixed': {
            'pipeline': exampleAssayPipelineName,
            'model': 'houseprice-model-yns',
            'start_at': '2022-01-01T00:00:00-05:00',
            'end_at': '2022-01-02T00:00:00-05:00'
        }
    },
    'window': {
        'pipeline': exampleAssayPipelineName,
        'model': exampleAssayModelName,
        'width': '24 hours',
        'start': None,
        'interval': None
    },
    'summarizer': {
        'type': 'UnivariateContinuous',
        'bin_mode': 'Quantile',
        'aggregation': 'Density',
        'metric': 'PSI',
        'num_bins': 5,
        'bin_weights': None,
        'bin_width': None,
        'provided_edges': None,
        'add_outlier_edges': True
    },
    'warning_threshold': 0,
    'alert_threshold': 0.1,
    'run_until': None,
    'workspace_id': exampleAssayWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
example_assay_id = response['assay_id']
response
{'assay_id': 2}

List Assays

Lists all assays in the specified pipeline.

  • PARAMETERS
    • pipeline_id - (REQUIRED int): The numerical ID of the pipeline.
  • RETURNS
    • assays - (Array assays): A list of all assays.

Example: Display a list of all assays in a workspace. This will assume we have a workspace with an existing Assay and the associated data has been upload. See the tutorial Wallaroo Assays Tutorial.

For this reason, these values are hard coded for now.

## First list all of the workspaces and the list of pipelines

# List workspaces

apiRequest = "/workspaces/list"

data = {
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'workspaces': [{'id': 1,
   'name': 'john.hansarick@wallaroo.ai - Default Workspace',
   'created_at': '2022-10-10T16:32:45.355874+00:00',
   'created_by': 'f68760ad-a27c-4f9b-808f-0b512f07571f',
   'archived': False,
   'models': [],
   'pipelines': []},
  {'id': 3,
   'name': 'testapiworkspace-e87e543f-25f1-4f6d-82c6-4eb48902575a',
   'created_at': '2022-10-10T18:25:27.926919+00:00',
   'created_by': 'f68760ad-a27c-4f9b-808f-0b512f07571f',
   'archived': False,
   'models': [1],
   'pipelines': [1, 2, 3]},
  {'id': 4,
   'name': 'housepricedrift',
   'created_at': '2022-10-10T18:38:50.748057+00:00',
   'created_by': 'f68760ad-a27c-4f9b-808f-0b512f07571f',
   'archived': False,
   'models': [2],
   'pipelines': [4]},
  {'id': 5,
   'name': 'housepricedrifts',
   'created_at': '2022-10-10T18:45:00.152716+00:00',
   'created_by': 'f68760ad-a27c-4f9b-808f-0b512f07571f',
   'archived': False,
   'models': [],
   'pipelines': []}]}
# Get assays

apiRequest = "/assays/list"

data = {
    "pipeline_id": exampleAssayPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
[{'id': 3,
  'name': 'example assay',
  'active': True,
  'status': 'created',
  'warning_threshold': None,
  'alert_threshold': 0.1,
  'pipeline_id': 4,
  'pipeline_name': 'housepricepipe',
  'last_run': None,
  'next_run': '2022-10-10T19:00:43.941894+00:00',
  'run_until': None,
  'updated_at': '2022-10-10T19:00:43.945411+00:00',
  'baseline': {'Fixed': {'pipeline': 'housepricepipe',
    'model': 'housepricemodel',
    'start_at': '2022-01-01T00:00:00+00:00',
    'end_at': '2022-01-02T00:00:00+00:00'}},
  'window': {'pipeline': 'housepricepipe',
   'model': 'housepricemodel',
   'width': '24 hours',
   'start': None,
   'interval': None},
  'summarizer': {'type': 'UnivariateContinuous',
   'bin_mode': 'Quantile',
   'aggregation': 'Density',
   'metric': 'PSI',
   'num_bins': 5,
   'bin_weights': None,
   'bin_width': None,
   'provided_edges': None,
   'add_outlier_edges': True}},
 {'id': 2,
  'name': 'api_assay_test2',
  'active': True,
  'status': 'created',
  'warning_threshold': 0.0,
  'alert_threshold': 0.1,
  'pipeline_id': 4,
  'pipeline_name': 'housepricepipe',
  'last_run': None,
  'next_run': '2022-10-10T18:53:16.444786+00:00',
  'run_until': None,
  'updated_at': '2022-10-10T18:53:16.450269+00:00',
  'baseline': {'Fixed': {'pipeline': 'housepricepipe',
    'model': 'houseprice-model-yns',
    'start_at': '2022-01-01T00:00:00-05:00',
    'end_at': '2022-01-02T00:00:00-05:00'}},
  'window': {'pipeline': 'housepricepipe',
   'model': 'housepricemodel',
   'width': '24 hours',
   'start': None,
   'interval': None},
  'summarizer': {'type': 'UnivariateContinuous',
   'bin_mode': 'Quantile',
   'aggregation': 'Density',
   'metric': 'PSI',
   'num_bins': 5,
   'bin_weights': None,
   'bin_width': None,
   'provided_edges': None,
   'add_outlier_edges': True}},
 {'id': 1,
  'name': 'api_assay_test',
  'active': True,
  'status': 'created',
  'warning_threshold': 0.0,
  'alert_threshold': 0.1,
  'pipeline_id': 4,
  'pipeline_name': 'housepricepipe',
  'last_run': None,
  'next_run': '2022-10-10T18:48:00.829479+00:00',
  'run_until': None,
  'updated_at': '2022-10-10T18:48:00.833336+00:00',
  'baseline': {'Fixed': {'pipeline': 'housepricepipe',
    'model': 'houseprice-model-yns',
    'start_at': '2022-01-01T00:00:00-05:00',
    'end_at': '2022-01-02T00:00:00-05:00'}},
  'window': {'pipeline': 'housepricepipe',
   'model': 'housepricemodel',
   'width': '24 hours',
   'start': None,
   'interval': None},
  'summarizer': {'type': 'UnivariateContinuous',
   'bin_mode': 'Quantile',
   'aggregation': 'Density',
   'metric': 'PSI',
   'num_bins': 5,
   'bin_weights': None,
   'bin_width': None,
   'provided_edges': None,
   'add_outlier_edges': True}}]

Activate or Deactivate Assay

Activates or deactivates an existing assay.

  • Parameters
    • id - (REQUIRED int): The numerical id of the assay.
    • active - (REQUIRED bool): True to activate the assay, False to deactivate it.
  • Returns
      • id - (integer): The numerical id of the assay.
    • active - (bool): True to activate the assay, False to deactivate it.

Example: Assay 8 “House Output Assay” will be deactivated then activated.

# Deactivate assay

apiRequest = "/assays/set_active"

data = {
    'id': example_assay_id,
    'active': False
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'id': 2, 'active': False}
# Activate assay

apiRequest = "/assays/set_active"

data = {
    'id': example_assay_id,
    'active': True
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'id': 2, 'active': True}

Create Interactive Baseline

Creates an interactive assay baseline.

  • PARAMETERS
    • id - (REQUIRED int): The numerical identifier for the assay.
    • name - (REQUIRED string): The name of the assay.
    • pipeline_id - (REQUIRED int): The numerical idenfifier the assay will be placed into.
    • pipeline_name - (REQUIRED string): The name of the pipeline
    • active - (REQUIRED bool): Indicates whether the assay will be active upon creation or not.
    • status - (REQUIRED string): The status of the assay upon creation.
    • iopath - (REQUIRED string): The iopath of the assay.
    • baseline - (REQUIRED baseline): The baseline for the assay.
      • Fixed - (REQUIRED AssayFixConfiguration): The fixed configuration for the assay.
        • pipeline - (REQUIRED string): The name of the pipeline with the baseline data.
        • model - (REQUIRED string): The name of the model used.
        • start_at - (REQUIRED string): The DateTime of the baseline start date.
        • end_at - (REQUIRED string): The DateTime of the baseline end date.
    • window (REQUIRED AssayWindow): Assay window.
      • pipeline - (REQUIRED string): The name of the pipeline for the assay window.
      • model - (REQUIRED string): The name of the model used for the assay window.
      • width - (REQUIRED string): The width of the assay window.
      • start - (OPTIONAL string): The DateTime of when to start the assay window.
      • interval - (OPTIONAL string): The assay window interval.
    • summarizer - (REQUIRED AssaySummerizer): The summarizer type for the array aka “advanced settings” in the Wallaroo Dashboard UI.
      • type - (REQUIRED string): Type of summarizer.
      • bin_mode - (REQUIRED string): The binning model type. Values can be:
        • Quantile
        • Equal
      • aggregation - (REQUIRED string): Aggregation type.
      • metric - (REQUIRED string): Metric type. Values can be:
        • PSI
        • Maximum Difference of Bins
        • Sum of the Difference of Bins
      • num_bins - (REQUIRED int): The number of bins. Recommanded values are between 5 and 14.
      • bin_weights - (OPTIONAL AssayBinWeight): The weights assigned to the assay bins.
      • bin_width - (OPTIONAL AssayBinWidth): The width assigned to the assay bins.
      • provided_edges - (OPTIONAL AssayProvidedEdges): The edges used for the assay bins.
      • add_outlier_edges - (REQUIRED bool): Indicates whether to add outlier edges or not.
    • warning_threshold - (OPTIONAL number): Optional warning threshold.
    • alert_threshold - (REQUIRED number): Alert threshold.
    • run_until - (OPTIONAL string): DateTime of when to end the assay.
    • workspace_id - (REQUIRED integer): The workspace the assay is part of.
    • model_insights_url - (OPTIONAL string): URL for model insights.
  • RETURNS
    • {} when successful.

Example: An interactive assay baseline will be set for the assay “Test Assay” on Pipeline 4.

# Run interactive baseline

apiRequest = "/assays/run_interactive_baseline"

exampleAssayPipelineId = 4
exampleAssayPipelineName = "housepricepipe"
exampleAssayModelName = "housepricemodel"
exampleAssayWorkspaceId = 4
exampleAssayId = 3
exampleAssayName = "example assay"

data = {
    'id': exampleAssayId,
    'name': exampleAssayName,
    'pipeline_id': exampleAssayPipelineId,
    'pipeline_name': exampleAssayPipelineName,
    'active': True,
    'status': 'active',
    'iopath': "input 0 0",
    'baseline': {
        'Fixed': {
            'pipeline': exampleAssayPipelineName,
            'model': exampleAssayModelName,
            'start_at': '2022-01-01T00:00:00-05:00',
            'end_at': '2022-01-02T00:00:00-05:00'
        }
    },
    'window': {
        'pipeline': exampleAssayPipelineName,
        'model': exampleAssayModelName,
        'width': '24 hours',
        'start': None,
        'interval': None
    },
    'summarizer': {
        'type': 'UnivariateContinuous',
        'bin_mode': 'Quantile',
        'aggregation': 'Density',
        'metric': 'PSI',
        'num_bins': 5,
        'bin_weights': None,
        'bin_width': None,
        'provided_edges': None,
        'add_outlier_edges': True
    },
    'warning_threshold': 0,
    'alert_threshold': 0.1,
    'run_until': None,
    'workspace_id': exampleAssayWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
{'assay_id': 3,
 'name': 'example assay',
 'created_at': 1665428974654,
 'elapsed_millis': 3,
 'pipeline_id': 4,
 'pipeline_name': 'housepricepipe',
 'iopath': 'input 0 0',
 'baseline_summary': {'count': 1812,
  'min': -3.6163810435665233,
  'max': 7.112734553641073,
  'mean': 0.03518936967736047,
  'median': -0.39764636440424433,
  'std': 0.9885006118746916,
  'edges': [-3.6163810435665233,
   -0.39764636440424433,
   -0.39764636440424433,
   0.6752651953165153,
   0.6752651953165153,
   7.112734553641073,
   None],
  'edge_names': ['left_outlier',
   'q_20',
   'q_40',
   'q_60',
   'q_80',
   'q_100',
   'right_outlier'],
  'aggregated_values': [0.0,
   0.5739514348785872,
   0.0,
   0.3383002207505519,
   0.0,
   0.08774834437086093,
   0.0],
  'aggregation': 'Density',
  'start': '2022-01-01T05:00:00Z',
  'end': '2022-01-02T05:00:00Z'},
 'window_summary': {'count': 1812,
  'min': -3.6163810435665233,
  'max': 7.112734553641073,
  'mean': 0.03518936967736047,
  'median': -0.39764636440424433,
  'std': 0.9885006118746916,
  'edges': [-3.6163810435665233,
   -0.39764636440424433,
   -0.39764636440424433,
   0.6752651953165153,
   0.6752651953165153,
   7.112734553641073,
   None],
  'edge_names': ['left_outlier',
   'e_-3.98e-1',
   'e_-3.98e-1',
   'e_6.75e-1',
   'e_6.75e-1',
   'e_7.11e0',
   'right_outlier'],
  'aggregated_values': [0.0,
   0.5739514348785872,
   0.0,
   0.3383002207505519,
   0.0,
   0.08774834437086093,
   0.0],
  'aggregation': 'Density',
  'start': '2022-01-01T05:00:00Z',
  'end': '2022-01-02T05:00:00Z'},
 'warning_threshold': 0.0,
 'alert_threshold': 0.1,
 'score': 0.0,
 'scores': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
 'index': None,
 'summarizer_meta': '{"type":"UnivariateContinuous","bin_mode":"Quantile","aggregation":"Density","metric":"PSI","num_bins":5,"bin_weights":null,"provided_edges":null}',
 'status': 'BaselineRun'}

Get Assay Baseline

Retrieve an assay baseline.

  • Parameters
    • workspace_id - (REQUIRED integer): Numerical id for the workspace the assay is in.
    • pipeline_name - (REQUIRED string): Name of the pipeline the assay is in.
    • start - (OPTIONAL string): DateTime for when the baseline starts.
    • end - (OPTIONAL string): DateTime for when the baseline ends.
    • model_name - (OPTIONAL string): Name of the model.
    • limit - (OPTIONAL integer): Maximum number of baselines to return.
  • Returns
    • Assay Baseline

Example: 3 assay baselines for Workspace 6 and pipeline houseprice-pipe-yns will be retrieved.

# Get Assay Baseline

apiRequest = "/assays/get_baseline"

data = {
    'workspace_id': exampleAssayWorkspaceId,
    'pipeline_name': exampleAssayPipelineName,
    'limit': 3
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response
[{'check_failures': [],
  'elapsed': 138,
  'model_name': 'housepricemodel',
  'model_version': 'test_version',
  'original_data': {'tensor': [[0.6752651953165153,
     0.4999342471069234,
     -0.1508359058521761,
     0.20024994573167013,
     -0.08666382440547035,
     0.009116407905326388,
     -0.002872821251696453,
     -0.9179715198382244,
     -0.305653139057544,
     2.4393894526979074,
     0.29288456205300767,
     -0.3485179782510063,
     1.1121054807107582,
     0.20193559456886756,
     -0.20817781526102327,
     1.0279052268485522,
     -0.0196096612880121]]},
  'outputs': [{'Float': {'data': [13.262725830078123],
     'dim': [1, 1],
     'v': 1}}],
  'pipeline_name': 'housepricepipe',
  'time': 1643673456974},
 {'check_failures': [],
  'elapsed': 136,
  'model_name': 'housepricemodel',
  'model_version': 'test_version',
  'original_data': {'tensor': [[-0.39764636440424433,
     -1.4463372267359147,
     0.044822346031326635,
     -0.4259897870655369,
     -0.08666382440547035,
     -0.009153974747246364,
     -0.2568455220872559,
     0.005746226275241667,
     -0.305653139057544,
     -0.6285378875598833,
     -0.5584151415472702,
     -0.9748223338538442,
     -0.65605032361317,
     -1.5328599554165074,
     -0.20817781526102327,
     0.06504981348446033,
     -0.20382525042318508]]},
  'outputs': [{'Float': {'data': [12.82761001586914], 'dim': [1, 1], 'v': 1}}],
  'pipeline_name': 'housepricepipe',
  'time': 1643673504654},
 {'check_failures': [],
  'elapsed': 93,
  'model_name': 'housepricemodel',
  'model_version': 'test_version',
  'original_data': {'tensor': [[-1.470557924125004,
     -0.4732014898144956,
     1.0989221532266944,
     1.3317512811267456,
     -0.08666382440547035,
     0.006116141374609494,
     -0.21472817109954076,
     -0.9179715198382244,
     -0.305653139057544,
     -0.6285378875598833,
     0.29288456205300767,
     -0.14376463122700162,
     -0.65605032361317,
     1.1203567680905366,
     -0.20817781526102327,
     0.2692918708647222,
     -0.23870674508328787]]},
  'outputs': [{'Float': {'data': [13.03465175628662], 'dim': [1, 1], 'v': 1}}],
  'pipeline_name': 'housepricepipe',
  'time': 1643673552333}]

Run Assay Interactively

Runs an assay.

  • Parameters
    • id - (REQUIRED int): The numerical identifier for the assay.
    • name - (REQUIRED string): The name of the assay.
    • pipeline_id - (REQUIRED int): The numerical idenfifier the assay will be placed into.
    • pipeline_name - (REQUIRED string): The name of the pipeline
    • active - (REQUIRED bool): Indicates whether the assay will be active upon creation or not.
    • status - (REQUIRED string): The status of the assay upon creation.
    • iopath - (REQUIRED string): The iopath of the assay.
    • baseline - (REQUIRED baseline): The baseline for the assay.
      • Fixed - (REQUIRED AssayFixConfiguration): The fixed configuration for the assay.
        • pipeline - (REQUIRED string): The name of the pipeline with the baseline data.
        • model - (REQUIRED string): The name of the model used.
        • start_at - (REQUIRED string): The DateTime of the baseline start date.
        • end_at - (REQUIRED string): The DateTime of the baseline end date.
    • window (REQUIRED AssayWindow): Assay window.
      • pipeline - (REQUIRED string): The name of the pipeline for the assay window.
      • model - (REQUIRED string): The name of the model used for the assay window.
      • width - (REQUIRED string): The width of the assay window.
      • start - (OPTIONAL string): The DateTime of when to start the assay window.
      • interval - (OPTIONAL string): The assay window interval.
    • summarizer - (REQUIRED AssaySummerizer): The summarizer type for the array aka “advanced settings” in the Wallaroo Dashboard UI.
      • type - (REQUIRED string): Type of summarizer.
      • bin_mode - (REQUIRED string): The binning model type. Values can be:
        • Quantile
        • Equal
      • aggregation - (REQUIRED string): Aggregation type.
      • metric - (REQUIRED string): Metric type. Values can be:
        • PSI
        • Maximum Difference of Bins
        • Sum of the Difference of Bins
      • num_bins - (REQUIRED int): The number of bins. Recommanded values are between 5 and 14.
      • bin_weights - (OPTIONAL AssayBinWeight): The weights assigned to the assay bins.
      • bin_width - (OPTIONAL AssayBinWidth): The width assigned to the assay bins.
      • provided_edges - (OPTIONAL AssayProvidedEdges): The edges used for the assay bins.
      • add_outlier_edges - (REQUIRED bool): Indicates whether to add outlier edges or not.
    • warning_threshold - (OPTIONAL number): Optional warning threshold.
    • alert_threshold - (REQUIRED number): Alert threshold.
    • run_until - (OPTIONAL string): DateTime of when to end the assay.
    • workspace_id - (REQUIRED integer): The workspace the assay is part of.
    • model_insights_url - (OPTIONAL string): URL for model insights.
  • Returns
    • Assay

Example: An interactive assay will be run for Assay exampleAssayId exampleAssayName. Depending on the number of assay results and the data window, this may take some time. This returns all of the results for this assay at this time. The total number of responses will be displayed after.

# Run interactive assay

apiRequest = "/assays/run_interactive"

data = {
    'id': exampleAssayId,
    'name': exampleAssayName,
    'pipeline_id': exampleAssayPipelineId,
    'pipeline_name': exampleAssayPipelineName,
    'active': True,
    'status': 'active',
    'iopath': "input 0 0",
    'baseline': {
        'Fixed': {
            'pipeline': exampleAssayPipelineName,
            'model': exampleAssayModelName,
            'start_at': '2022-01-01T00:00:00-05:00',
            'end_at': '2022-01-02T00:00:00-05:00'
        }
    },
    'window': {
        'pipeline': exampleAssayPipelineName,
        'model': exampleAssayModelName,
        'width': '24 hours',
        'start': None,
        'interval': None
    },
    'summarizer': {
        'type': 'UnivariateContinuous',
        'bin_mode': 'Quantile',
        'aggregation': 'Density',
        'metric': 'PSI',
        'num_bins': 5,
        'bin_weights': None,
        'bin_width': None,
        'provided_edges': None,
        'add_outlier_edges': True
    },
    'warning_threshold': 0,
    'alert_threshold': 0.1,
    'run_until': None,
    'workspace_id': exampleAssayWorkspaceId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response[0]
{'assay_id': 3,
 'name': 'example assay',
 'created_at': 1665429281268,
 'elapsed_millis': 178,
 'pipeline_id': 4,
 'pipeline_name': 'housepricepipe',
 'iopath': 'input 0 0',
 'baseline_summary': {'count': 1812,
  'min': -3.6163810435665233,
  'max': 7.112734553641073,
  'mean': 0.03518936967736047,
  'median': -0.39764636440424433,
  'std': 0.9885006118746916,
  'edges': [-3.6163810435665233,
   -0.39764636440424433,
   -0.39764636440424433,
   0.6752651953165153,
   0.6752651953165153,
   7.112734553641073,
   None],
  'edge_names': ['left_outlier',
   'q_20',
   'q_40',
   'q_60',
   'q_80',
   'q_100',
   'right_outlier'],
  'aggregated_values': [0.0,
   0.5739514348785872,
   0.0,
   0.3383002207505519,
   0.0,
   0.08774834437086093,
   0.0],
  'aggregation': 'Density',
  'start': '2022-01-01T05:00:00Z',
  'end': '2022-01-02T05:00:00Z'},
 'window_summary': {'count': 1812,
  'min': -3.6163810435665233,
  'max': 3.8939998744787943,
  'mean': 0.006175756859303479,
  'median': -0.39764636440424433,
  'std': 0.9720429128755866,
  'edges': [-3.6163810435665233,
   -0.39764636440424433,
   -0.39764636440424433,
   0.6752651953165153,
   0.6752651953165153,
   7.112734553641073,
   None],
  'edge_names': ['left_outlier',
   'e_-3.98e-1',
   'e_-3.98e-1',
   'e_6.75e-1',
   'e_6.75e-1',
   'e_7.11e0',
   'right_outlier'],
  'aggregated_values': [0.0,
   0.5883002207505519,
   0.0,
   0.3162251655629139,
   0.0,
   0.09547461368653422,
   0.0],
  'aggregation': 'Density',
  'start': '2022-01-02T05:00:00Z',
  'end': '2022-01-03T05:00:00Z'},
 'warning_threshold': 0.0,
 'alert_threshold': 0.1,
 'score': 0.002495916218595029,
 'scores': [0.0,
  0.0003543090106786176,
  0.0,
  0.0014896074883327124,
  0.0,
  0.0006519997195836994,
  0.0],
 'index': None,
 'summarizer_meta': {'type': 'UnivariateContinuous',
  'bin_mode': 'Quantile',
  'aggregation': 'Density',
  'metric': 'PSI',
  'num_bins': 5,
  'bin_weights': None,
  'provided_edges': None},
 'status': 'Warning'}
print(len(response))
30

Get Assay Results

Retrieve the results for an assay.

  • Parameters
    • assay_id - (REQUIRED integer): Numerical id for the assay.
    • start - (OPTIONAL string): DateTime for when the baseline starts.
    • end - (OPTIONAL string): DateTime for when the baseline ends.
    • limit - (OPTIONAL integer): Maximum number of results to return.
    • pipeline_id - (OPTIONAL integer): Numerical id of the pipeline the assay is in.
  • Returns
    • Assay Baseline

Example: Results for Assay 3 “example assay” will be retrieved for January 2 to January 3. For the sake of time, only the first record will be displayed.

# Get Assay Results

apiRequest = "/assays/get_results"

data = {
    'assay_id': exampleAssayId,
    'pipeline_id': exampleAssayPipelineId
}

response = get_wallaroo_response(APIURL, apiRequest, TOKEN, data)
response

1.2 - Wallaroo MLOps API Reference Guide

2 - Wallaroo SDK Guides

Reference Guide for the most essential Wallaroo SDK Commands

2.1 - Wallaroo SDK Install Guides

How to install the Wallaroo SDK

The following guides demonstrate how to install the Wallaroo SDK in different environments. The Wallaroo SDK is installed by default into a Wallaroo instance for use with the JupyterHub service.

The Wallaroo SDK requires Python 3.8.6 and above and is available through the Wallaroo SDK Page.

Supported Model Versions and Libraries

The following ML Model versions and Python libraries are supported by Wallaroo. When using the Wallaroo autoconversion library or working with a local version of the Wallaroo SDK, use the following versions for maximum compatibility.

Library Supported Version
Python 3.8.6 and above
onnx 1.12.0
tensorflow 2.9.1
keras 2.9.0
pytorch Latest stable version. When converting from PyTorch to onnx, verify that the onnx version matches the version above.
sk-learn aka scikit-learn 1.1.2
statsmodels 0.13.2
XGBoost 1.6.2
MLFlow 1.30.0

Supported Data Types

The following data types are supported for transporting data to and from Wallaroo in the following run times:

  • ONNX
  • TensorFlow
  • MLFlow

Float Types

Runtime BFloat16* Float16 Float32 Float64
ONNX X X
TensorFlow X X X
MLFlow X X X
  • * (Brain Float 16, represented internally as a f32)

Int Types

Runtime Int8 Int16 Int32 Int64
ONNX X X X X
TensorFlow X X X X
MLFlow X X X X

Uint Types

Runtime Uint8 Uint16 Uint32 Uint64
ONNX X X X X
TensorFlow X X X X
MLFlow X X X X

Other Types

Runtime Boolean Utf8 (String) Complex 64 Complex 128 FixedSizeList*
ONNX X
Tensor X X X
MLFlow X X X
  • * Fixed sized lists of any of the previously supported data types.

2.1.1 - Wallaroo SDK AWS Sagemaker Install Guide

How to install the Wallaroo SDK in AWS Sagemaker

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Installing the Wallaroo SDK in AWS Sagemaker

Organizations that develop machine learning models can deploy models to Wallaroo from AWS Sagemaker to a Wallaroo instance through the Wallaroo SDK. The following guide is created to assist users with installing the Wallaroo SDK and making a standard connection to a Wallaroo instance.

Organizations can use Wallaroo SSO for Amazon Web Services to provide AWS users access to the Wallaroo instance.

These instructions are based on the on the Connect to Wallaroo guides.

This tutorial provides the following:

  • aloha-cnn-lstm.zip: A pre-trained open source model that uses an Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.
  • Test Data Files:
    • data-1.json: 1 record
    • data-1k.json: 1,000 records
    • data-25k.json: 25,000 records

For this example, a virtual python environment will be used. This will set the necessary libraries and specific Python version required.

Prerequisites

The following is required for this tutorial:

  • A Wallaroo instance version 2023.1 or later.

  • A AWS Sagemaker domain with a Notebook Instance.

  • Python 3.8.6 or later installed locally.

General Steps

For our example, we will perform the following:

  • Install Wallaroo SDK
    • Set up a Python virtual environment through conda with the libraries that enable the virtual environment for use in a Jupyter Hub environment.
    • Install the Wallaroo SDK.
  • Wallaroo SDK from remote JupyterHub Demonstration (Optional): The following steps are an optional exercise to demonstrate using the Wallaroo SDK from a remote connection. The entire tutorial can be found on the Wallaroo Tutorials repository.
    • Connect to a remote Wallaroo instance.
    • Create a workspace for our work.
    • Upload the Aloha model.
    • Create a pipeline that can ingest our submitted data, submit it to the model, and export the results
    • Run a sample inference through our pipeline by loading a file
    • Retrieve the external deployment URL. This sample Wallaroo instance has been configured to create external inference URLs for pipelines. For more information, see the External Inference URL Guide.
    • Run a sample inference through our pipeline’s external URL and store the results in a file. This assumes that the External Inference URLs have been enabled for the target Wallaroo instance.
    • Undeploy the pipeline and return resources back to the Wallaroo instance’s Kubernetes environment.

Install Wallaroo SDK

Set Up Virtual Python Environment

To set up the Python virtual environment for use of the Wallaroo SDK:

  1. From AWS Sagemaker, select the Notebook instances.

  2. For the list of notebook instances, select Open JupyterLab for the notebook instance to be used.

  3. From the Launcher, select Terminal.

  4. From a terminal shell, create the Python virtual environment with conda. Replace wallaroosdk with the name of the virtual environment as required by your organization. Note that Python 3.8.6 and above is specified as a requirement for Python libraries used with the Wallaroo SDK. The following will install the latest version of Python 3.8.

    conda create -n wallaroosdk python=3.8
    
  5. (Optional) If the shells have not been initialized with conda, use the following to initialize it. The following examples will use the bash shell.

    1. Initialize the bash shell with conda with the command:

      conda init bash
      
    2. Launch the bash shell that has been initialized for conda:

      bash
      
  6. Activate the new environment.

    conda activate wallaroosdk
    
  7. Install the ipykernel library. This allows the JupyterHub notebooks to access the Python virtual environment as a kernel, and it required for the second part of this tutorial.

    conda install ipykernel
    
    1. Install the new virtual environment as a python kernel.

      ipython kernel install --user --name=wallaroosdk
      
  8. Install the Wallaroo SDK. This process may take several minutes while the other required Python libraries are added to the virtual environment.

    pip install wallaroo==2023.1.0
    

For organizations who will be using the Wallaroo SDK with Jupyter or similar services, the conda virtual environment has been installed, it can either be selected as a new Jupyter Notebook kernel, or the Notebook’s kernel can be set to an existing Jupyter notebook.

To use a new Notebook:

  1. From the main menu, select File->New-Notebook.
  2. From the Kernel selection dropbox, select the new virtual environment - in this case, wallaroosdk.

To update an existing Notebook to use the new virtual environment as a kernel:

  1. From the main menu, select Kernel->Change Kernel.
  2. Select the new kernel.

Sample Wallaroo Connection

With the Wallaroo Python SDK installed, remote commands and inferences can be performed through the following steps.

Open a Connection to Wallaroo

The first step is to connect to Wallaroo through the Wallaroo client.

This is accomplished using the wallaroo.Client(api_endpoint, auth_endpoint, auth_type command) command that connects to the Wallaroo instance services.

The Client method takes the following parameters:

  • api_endpoint (String): The URL to the Wallaroo instance API service.
  • auth_endpoint (String): The URL to the Wallaroo instance Keycloak service.
  • auth_type command (String): The authorization type. In this case, SSO.

The URLs are based on the Wallaroo Prefix and Wallaroo Suffix for the Wallaroo instance. For more information, see the DNS Integration Guide. In the example below, replace “YOUR PREFIX” and “YOUR SUFFIX” with the Wallaroo Prefix and Suffix, respectively.

Once run, the wallaroo.Client command provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Depending on the configuration of the Wallaroo instance, the user will either be presented with a login request to the Wallaroo instance or be authenticated through a broker such as Google, Github, etc. To use the broker, select it from the list under the username/password login forms. For more information on Wallaroo authentication configurations, see the Wallaroo Authentication Configuration Guides.

Wallaroo Login

Once authenticated, the user will verify adding the device the user is establishing the connection from. Once both steps are complete, then the connection is granted.

Device Registration

The connection is stored in the variable wl for use in all other Wallaroo calls.

import wallaroo
from wallaroo.object import EntityNotFoundError
wallaroo.__version__
'2022.4.0'
# SSO login through keycloak

wallarooPrefix = "YOUR PREFIX"
wallarooSuffix = "YOUR SUFFIX"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
                auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
                auth_type="sso")

Wallaroo Remote SDK Examples

The following examples can be used by an organization to test using the Wallaroo SDK from a remote location from their Wallaroo instance. These examples show how to create workspaces, deploy pipelines, and perform inferences through the SDK and API.

Create the Workspace

We will create a workspace to work in and call it the sdkworkspace, then set it as current workspace environment. We’ll also create our pipeline in advance as sdkpipeline.

  • IMPORTANT NOTE: For this example, the Aloha model is stored in the file alohacnnlstm.zip. When using tensor based models, the zip file must match the name of the tensor directory. For example, if the tensor directory is alohacnnlstm, then the .zip file must be named alohacnnlstm.zip.
workspace_name = 'sdkquickworkspace'
pipeline_name = 'sdkquickpipeline'
model_name = 'sdkquickmodel'
model_file_name = './alohacnnlstm.zip'
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name sdkquickpipeline
created 2022-12-20 15:56:22.974010+00:00
last_updated 2022-12-20 15:56:22.974010+00:00
deployed (none)
tags
versions 9cb1b842-f28b-4a9b-b0c3-eeced6067582
steps

We can verify the workspace is created the current default workspace with the get_current_workspace() command.

wl.get_current_workspace()
{'name': 'sdkquickworkspace', 'id': 618487, 'archived': False, 'created_by': '01a797f9-1357-4506-a4d2-8ab9c4681103', 'created_at': '2022-12-20T15:56:22.088161+00:00', 'models': [], 'pipelines': [{'name': 'sdkquickpipeline', 'create_time': datetime.datetime(2022, 12, 20, 15, 56, 22, 974010, tzinfo=tzutc()), 'definition': '[]'}]}

Upload the Models

Now we will upload our model. Note that for this example we are applying the model from a .ZIP file. The Aloha model is a protobuf file that has been defined for evaluating web pages, and we will configure it to use data in the tensorflow format.

model = wl.upload_model(model_name, model_file_name).configure("tensorflow")

Deploy a Model

Now that we have a model that we want to use we will create a deployment for it.

We will tell the deployment we are using a tensorflow model and give the deployment name and the configuration we want for the deployment.

To do this, we’ll create our pipeline that can ingest the data, pass the data to our Aloha model, and give us a final output. We’ll call our pipeline externalsdkpipeline, then deploy it so it’s ready to receive data. The deployment process usually takes about 45 seconds.

pipeline.add_model_step(model)
name sdkquickpipeline
created 2022-12-20 15:56:22.974010+00:00
last_updated 2022-12-20 15:56:22.974010+00:00
deployed (none)
tags
versions 9cb1b842-f28b-4a9b-b0c3-eeced6067582
steps
pipeline.deploy()

We can verify that the pipeline is running and list what models are associated with it.

pipeline.status()

Interferences

Infer 1 row

Now that the pipeline is deployed and our Aloha model is in place, we’ll perform a smoke test to verify the pipeline is up and running properly. We’ll use the infer_from_file command to load a single encoded URL into the inference engine and print the results back out.

The result should tell us that the tokenized URL is legitimate (0) or fraud (1). This sample data should return close to 0.

## Demonstrate via straight infer

import json

file = open('./data-1.json')

data = json.load(file)

result = pipeline.infer(data)
print(result)
# Demonstrate from infer_from_file
result = pipeline.infer_from_file("./data-1.json")
result[0].data()

Batch Inference

Now that our smoke test is successful, let’s really give it some data. We have two inference files we can use:

  • data-1k.json: Contains 10,000 inferences
  • data-25k.json: Contains 25,000 inferences

We’ll pipe the data-25k.json file through the pipeline deployment URL, and place the results in a file named response.txt. We’ll also display the time this takes. Note that for larger batches of 50,000 inferences or more can be difficult to view in Juypter Hub because of its size.

When retrieving the pipeline inference URL through an external SDK connection, the External Inference URL will be returned. This URL will function provided that the Enable external URL inference endpoints is enabled. For more information, see the Wallaroo Model Endpoints Guide.

pipeline.deploy()
name sdkquickpipeline
created 2022-12-20 15:56:22.974010+00:00
last_updated 2022-12-20 16:00:39.475003+00:00
deployed True
tags
versions d5db505b-79c3-4965-b8b8-6d8ccc10130a, 9cb1b842-f28b-4a9b-b0c3-eeced6067582
steps sdkquickmodel
external_url = pipeline._deployment._url()
external_url
'https://YOUR PREFIX.api.YOUR SUFFIX/v1/api/pipelines/infer/sdkquickpipeline-44'

The API connection details can be retrieved through the Wallaroo client mlops() command. This will display the connection URL, bearer token, and other information. The bearer token is available for one hour before it expires.

For this example, the API connection details will be retrieved, then used to submit an inference request through the external inference URL retrieved earlier.

connection =wl.mlops().__dict__
token = connection['token']
token
'eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJWalFITFhMMThub3BXNWVHM2hMOVJ5MDZ1SFVWMko1dHREUkVxSGtBT2VzIn0.eyJleHAiOjE2NzE1NTIwNzcsImlhdCI6MTY3MTU1MjAxNywiYXV0aF90aW1lIjoxNjcxNTUxNzU5LCJqdGkiOiI3YWYyZGRmYy1mMzQyLTQwNTYtYWQzMS01Y2ZlOWRkNmY0ODUiLCJpc3MiOiJodHRwczovL3NxdWlzaHktd2FsbGFyb28tNjE4Ny5rZXljbG9hay53YWxsYXJvby5kZXYvYXV0aC9yZWFsbXMvbWFzdGVyIiwiYXVkIjpbIm1hc3Rlci1yZWFsbSIsImFjY291bnQiXSwic3ViIjoiMDFhNzk3ZjktMTM1Ny00NTA2LWE0ZDItOGFiOWM0NjgxMTAzIiwidHlwIjoiQmVhcmVyIiwiYXpwIjoic2RrLWNsaWVudCIsInNlc3Npb25fc3RhdGUiOiJiNGQ0YTJmOC1lOGMzLTQ0ZDgtYTc1YS05YmZiMTI3NGNiMzciLCJhY3IiOiIxIiwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbImNyZWF0ZS1yZWFsbSIsImRlZmF1bHQtcm9sZXMtbWFzdGVyIiwib2ZmbGluZV9hY2Nlc3MiLCJhZG1pbiIsInVtYV9hdXRob3JpemF0aW9uIl19LCJyZXNvdXJjZV9hY2Nlc3MiOnsibWFzdGVyLXJlYWxtIjp7InJvbGVzIjpbInZpZXctaWRlbnRpdHktcHJvdmlkZXJzIiwidmlldy1yZWFsbSIsIm1hbmFnZS1pZGVudGl0eS1wcm92aWRlcnMiLCJpbXBlcnNvbmF0aW9uIiwiY3JlYXRlLWNsaWVudCIsIm1hbmFnZS11c2VycyIsInF1ZXJ5LXJlYWxtcyIsInZpZXctYXV0aG9yaXphdGlvbiIsInF1ZXJ5LWNsaWVudHMiLCJxdWVyeS11c2VycyIsIm1hbmFnZS1ldmVudHMiLCJtYW5hZ2UtcmVhbG0iLCJ2aWV3LWV2ZW50cyIsInZpZXctdXNlcnMiLCJ2aWV3LWNsaWVudHMiLCJtYW5hZ2UtYXV0aG9yaXphdGlvbiIsIm1hbmFnZS1jbGllbnRzIiwicXVlcnktZ3JvdXBzIl19LCJhY2NvdW50Ijp7InJvbGVzIjpbIm1hbmFnZS1hY2NvdW50IiwibWFuYWdlLWFjY291bnQtbGlua3MiLCJ2aWV3LXByb2ZpbGUiXX19LCJzY29wZSI6InByb2ZpbGUgZW1haWwiLCJzaWQiOiJiNGQ0YTJmOC1lOGMzLTQ0ZDgtYTc1YS05YmZiMTI3NGNiMzciLCJlbWFpbF92ZXJpZmllZCI6dHJ1ZSwiaHR0cHM6Ly9oYXN1cmEuaW8vand0L2NsYWltcyI6eyJ4LWhhc3VyYS11c2VyLWlkIjoiMDFhNzk3ZjktMTM1Ny00NTA2LWE0ZDItOGFiOWM0NjgxMTAzIiwieC1oYXN1cmEtZGVmYXVsdC1yb2xlIjoidXNlciIsIngtaGFzdXJhLWFsbG93ZWQtcm9sZXMiOlsidXNlciJdLCJ4LWhhc3VyYS11c2VyLWdyb3VwcyI6Int9In0sInByZWZlcnJlZF91c2VybmFtZSI6ImpvaG4uaGFuc2FyaWNrQHdhbGxhcm9vLmFpIiwiZW1haWwiOiJqb2huLmhhbnNhcmlja0B3YWxsYXJvby5haSJ9.lQupCrqaVlBRO0-Q0DT75hzzmRYQwpO4Dh8P5XzMKDoapsQuOEiuX0uq-E6WjjVN7sKRRwlVnWgP86PQkO3Yx706bdiWKXM6rXRQSv3ZlyuFt0S15MoH40gAJIOSZLi6BtZwSI6RVIdYeEnGmbv9RfBqt9iYBj6E7OYGu-2DlPp2Pai2i61383iVNmaIkStgukKLsFEPAfyGccxK01OyBF1XcaVrv0j4FHjtrQG2Sjcvqb9hDIYIEpFYwZ5j2qxDwLeyoHPhfpB6aVjjXhEUQYodHRyLsacmBnKpqfkwNHi-ZdxrwvU1wtUU7sf0miCAC8UEdLCTi00uW5ukOar_zw'
!curl -X POST {external_url} -H "Content-Type:application/json" -H "Authorization: Bearer {token}" --data @data-25k.json > curl_response.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.0M  100 10.1M  100 2886k   631k   174k  0:00:16  0:00:16 --:--:-- 2766k886k      0   330k  0:00:08  0:00:08 --:--:--     0

Undeploy Pipeline

When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks. Note that if the deployment variable is unchanged pipeline.deploy() will restart the inference engine in the same configuration as before.

pipeline.undeploy()
name sdkquickpipeline
created 2022-12-20 15:56:22.974010+00:00
last_updated 2022-12-20 16:00:39.475003+00:00
deployed False
tags
versions d5db505b-79c3-4965-b8b8-6d8ccc10130a, 9cb1b842-f28b-4a9b-b0c3-eeced6067582
steps sdkquickmodel

2.1.2 - Wallaroo SDK AzureML Install Guide

How to install the Wallaroo SDK in AzureML

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Installing the Wallaroo SDK into Azure ML Workspace

Organizations that use Azure ML for model training and development can deploy models to Wallaroo through the Wallaroo SDK. The following guide is created to assist users with installing the Wallaroo SDK, setting up authentication through Azure ML, and making a standard connection to a Wallaroo instance through Azure ML Workspace.

These instructions are based on the on the Wallaroo SSO for Microsoft Azure and the Connect to Wallaroo guides.

This tutorial provides the following:

  • aloha-cnn-lstm.zip: A pre-trained open source model that uses an Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.
  • Test Data Files:
    • data-1.json: 1 record
    • data-1k.json: 1,000 records
    • data-25k.json: 25,000 records

To use the Wallaroo SDK within Azure ML Workspace, a virtual environment will be used. This will set the necessary libraries and specific Python version required.

Prerequisites

The following is required for this tutorial:

  • A Wallaroo instance version 2023.1 or later.
  • Python 3.8.6 or later installed locally
  • Conda: Used for managing python virtual environments. This is automatically included in Azure ML Workspace.
  • An Azure ML workspace is created with a compute configured.

General Steps

For our example, we will perform the following:

  • Wallaroo SDK Install
    • Set up a Python virtual environment through conda with the libraries that enable the virtual environment for use in a Jupyter Hub environment.
    • Install the Wallaroo SDK.
  • Wallaroo SDK from remote JupyterHub Demonstration (Optional): The following steps are an optional exercise to demonstrate using the Wallaroo SDK from a remote connection. The entire tutorial can be found on the Wallaroo Tutorials repository).
    • Connect to a remote Wallaroo instance.
    • Create a workspace for our work.
    • Upload the Aloha model.
    • Create a pipeline that can ingest our submitted data, submit it to the model, and export the results
    • Run a sample inference through our pipeline by loading a file
    • Retrieve the external deployment URL. This sample Wallaroo instance has been configured to create external inference URLs for pipelines. For more information, see the External Inference URL Guide.
    • Run a sample inference through our pipeline’s external URL and store the results in a file. This assumes that the External Inference URLs have been enabled for the target Wallaroo instance.
    • Undeploy the pipeline and return resources back to the Wallaroo instance’s Kubernetes environment.

Install Wallaroo SDK

Set Up Virtual Python Environment

To set up the virtual environment in Azure ML for using the Wallaroo SDK with Azure ML Workspace:

  1. Select Notebooks.

  2. Create a new folder where the Jupyter Notebooks for Wallaroo will be installed.

  3. From this repository, upload sdk-install-guides/azure-ml-sdk-install.zip, or upload the entire folder sdk-install-guides/azure-ml-sdk-install. This tutorial will assume the .zip file was uploaded.

  4. Select Open Terminal. Navigate to the target directory.

  5. Run unzip azure-ml-sdk-install.zip to unzip the directory, then cd into it with cd azure-ml-sdk-install.

  6. Create the Python virtual environment with conda. Replace wallaroosdk with the name of the virtual environment as required by your organization. Note that Python 3.8.6 and above is specified as a requirement for Python libraries used with the Wallaroo SDK. The following will install the latest version of Python 3.8, which as of this time is 3.8.15.

    conda create -n wallaroosdk python=3.8
    
  7. Activate the new environment.

    conda activate wallaroosdk
    
  8. Install the ipykernel library. This allows the JupyterHub notebooks to access the Python virtual environment as a kernel.

    conda install ipykernel
    
  9. Install the new virtual environment as a python kernel.

    ipython kernel install --user --name=wallaroosdk
    
  10. Install the Wallaroo SDK. This process may take several minutes while the other required Python libraries are added to the virtual environment.

    pip install wallaroo==2023.1.0
    

Once the conda virtual environment has been installed, it can either be selected as a new Jupyter Notebook kernel, or the Notebook’s kernel can be set to an existing Jupyter notebook. If a notebook is existing, close it then reopen to select the new Wallaroo SDK environment.

To use a new Notebook:

  1. From the left navigation panel, select +->Notebook.
  2. From the Kernel selection dropbox on the upper right side, select the new virtual environment - in this case, wallaroosdk.

To update an existing Notebook to use the new virtual environment as a kernel:

  1. From the main menu, select Kernel->Change Kernel.
  2. Select the new kernel.

Sample Wallaroo Connection

With the Wallaroo Python SDK installed, remote commands and inferences can be performed through the following steps.

Open a Connection to Wallaroo

The first step is to connect to Wallaroo through the Wallaroo client.

This is accomplished using the wallaroo.Client(api_endpoint, auth_endpoint, auth_type command) command that connects to the Wallaroo instance services.

The Client method takes the following parameters:

  • api_endpoint (String): The URL to the Wallaroo instance API service.
  • auth_endpoint (String): The URL to the Wallaroo instance Keycloak service.
  • auth_type command (String): The authorization type. In this case, SSO.

The URLs are based on the Wallaroo Prefix and Wallaroo Suffix for the Wallaroo instance. For more information, see the DNS Integration Guide. In the example below, replace “YOUR PREFIX” and “YOUR SUFFIX” with the Wallaroo Prefix and Suffix, respectively.

Once run, the wallaroo.Client command provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Depending on the configuration of the Wallaroo instance, the user will either be presented with a login request to the Wallaroo instance or be authenticated through a broker such as Google, Github, etc. To use the broker, select it from the list under the username/password login forms. For more information on Wallaroo authentication configurations, see the Wallaroo Authentication Configuration Guides.

Wallaroo Login

Once authenticated, the user will verify adding the device the user is establishing the connection from. Once both steps are complete, then the connection is granted.

Device Registration

The connection is stored in the variable wl for use in all other Wallaroo calls.

import wallaroo
from wallaroo.object import EntityNotFoundError

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
# SSO login through keycloak

wallarooPrefix = "YOURPREFIX"
wallarooSuffix = "YOURSUFFIX"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
                    auth_type="sso")

Arrow Support

As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs. This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.

If Arrow support has been enabled, arrowEnabled=True. If disabled or you’re not sure, set it to arrowEnabled=False

The examples below will be shown in an arrow enabled environment.

import os
arrowEnabled=True
os.environ["ARROW_ENABLED"]=f"{arrowEnabled}"

Create the Workspace

We will create a workspace to work in and call it the azuremlsdkworkspace, then set it as current workspace environment. We’ll also create our pipeline in advance as azuremlsdkpipeline.

  • IMPORTANT NOTE: For this example, the Aloha model is stored in the file alohacnnlstm.zip. When using tensor based models, the zip file must match the name of the tensor directory. For example, if the tensor directory is alohacnnlstm, then the .zip file must be named alohacnnlstm.zip.
workspace_name = 'azuremlsdkworkspace'
pipeline_name = 'azuremlsdkpipeline'
model_name = 'azuremlsdkmodel'
model_file_name = './alohacnnlstm.zip'
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name gcpsdkpipeline
created 2022-12-06 21:35:51.201925+00:00
last_updated 2022-12-06 21:35:51.201925+00:00
deployed (none)
tags
versions 90045b0b-1978-48bb-9f37-05c0c5d8bf22
steps

We can verify the workspace is created the current default workspace with the get_current_workspace() command.

wl.get_current_workspace()
{'name': 'gcpsdkworkspace', 'id': 10, 'archived': False, 'created_by': '0bbf2f62-a4f1-4fe5-aad8-ec1cb7485939', 'created_at': '2022-12-06T21:35:50.34358+00:00', 'models': [], 'pipelines': [{'name': 'gcpsdkpipeline', 'create_time': datetime.datetime(2022, 12, 6, 21, 35, 51, 201925, tzinfo=tzutc()), 'definition': '[]'}]}

Upload the Models

Now we will upload our model. Note that for this example we are applying the model from a .ZIP file. The Aloha model is a protobuf file that has been defined for evaluating web pages, and we will configure it to use data in the tensorflow format.

model = wl.upload_model(model_name, model_file_name).configure("tensorflow")

Deploy a Model

Now that we have a model that we want to use we will create a deployment for it.

We will tell the deployment we are using a tensorflow model and give the deployment name and the configuration we want for the deployment.

To do this, we’ll create our pipeline that can ingest the data, pass the data to our Aloha model, and give us a final output. We’ll call our pipeline externalsdkpipeline, then deploy it so it’s ready to receive data. The deployment process usually takes about 45 seconds.

pipeline.add_model_step(model)
name gcpsdkpipeline
created 2022-12-06 21:35:51.201925+00:00
last_updated 2022-12-06 21:35:51.201925+00:00
deployed (none)
tags
versions 90045b0b-1978-48bb-9f37-05c0c5d8bf22
steps
pipeline.deploy()
name gcpsdkpipeline
created 2022-12-06 21:35:51.201925+00:00
last_updated 2022-12-06 21:35:55.428652+00:00
deployed True
tags
versions 269179a8-79e4-4c58-b9c3-d05436ad7be3, 90045b0b-1978-48bb-9f37-05c0c5d8bf22
steps gcpsdkmodel

We can verify that the pipeline is running and list what models are associated with it.

pipeline.status()
{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.1.174',
   'name': 'engine-7888f44c8b-r2gpr',
   'status': 'Running',
   'reason': None,
   'details': ['containers with unready status: [engine]',
    'containers with unready status: [engine]'],
   'pipeline_statuses': {'pipelines': [{'id': 'gcpsdkpipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'gcpsdkmodel',
      'version': 'c468d323-257b-4717-bbd8-8539a8746496',
      'sha': '7c89707252ce389980d5348c37885d6d72af4c20cd303422e2de7e66dd7ff184',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.1.173',
   'name': 'engine-lb-c6485cfd5-kqsn6',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Interferences

Infer 1 row

Now that the pipeline is deployed and our Aloha model is in place, we’ll perform a smoke test to verify the pipeline is up and running properly. We’ll use the infer_from_file command to load a single encoded URL into the inference engine and print the results back out.

The result should tell us that the tokenized URL is legitimate (0) or fraud (1). This sample data should return close to 0.

# Infer from file
if arrowEnabled is True:
    result = pipeline.infer_from_file('./data/data_1.df.json')
    display(result)
else:
    result = pipeline.infer_from_file("./data/data_1.json")
    display(result[0].data())

Batch Inference

Now that our smoke test is successful, let’s really give it some data. We have two inference files we can use:

  • data-1k.json: Contains 10,000 inferences
  • data-25k.json: Contains 25,000 inferences

We’ll pipe the data-25k.json file through the pipeline deployment URL, and place the results in a file named response.txt. We’ll also display the time this takes. Note that for larger batches of 50,000 inferences or more can be difficult to view in Juypter Hub because of its size.

When retrieving the pipeline inference URL through an external SDK connection, the External Inference URL will be returned. This URL will function provided that the Enable external URL inference endpoints is enabled. For more information, see the Wallaroo Model Endpoints Guide.

external_url = pipeline._deployment._url()
external_url
'https://YOUR PREFIX.api.example.wallaroo.ai/v1/api/pipelines/infer/gcpsdkpipeline-13'

The API connection details can be retrieved through the Wallaroo client mlops() command. This will display the connection URL, bearer token, and other information. The bearer token is available for one hour before it expires.

For this example, the API connection details will be retrieved, then used to submit an inference request through the external inference URL retrieved earlier.

connection =wl.mlops().__dict__
token = connection['token']
token
'eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJyb3dHSmdNdnlCODFyRFBzQURxc3RIM0hIbFdZdmdhMnluUmtGXzllSWhjIn0.eyJleHAiOjE2NzAzNjI2NjMsImlhdCI6MTY3MDM2MjYwMywiYXV0aF90aW1lIjoxNjcwMzYyNTQ1LCJqdGkiOiI5NDk5M2Y2Ni0yMjk2LTRiMTItOTYwMi1iOWEyM2UxY2RhZGIiLCJpc3MiOiJodHRwczovL21hZ2ljYWwtYmVhci0zNzgyLmtleWNsb2FrLndhbGxhcm9vLmNvbW11bml0eS9hdXRoL3JlYWxtcy9tYXN0ZXIiLCJhdWQiOlsibWFzdGVyLXJlYWxtIiwiYWNjb3VudCJdLCJzdWIiOiIwYmJmMmY2Mi1hNGYxLTRmZTUtYWFkOC1lYzFjYjc0ODU5MzkiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJzZGstY2xpZW50Iiwic2Vzc2lvbl9zdGF0ZSI6ImQyYjlkMzFjLWU3ZmMtNDI4OS1hOThjLTI2ZTMwMDBiMzVkMiIsImFjciI6IjEiLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsiY3JlYXRlLXJlYWxtIiwiZGVmYXVsdC1yb2xlcy1tYXN0ZXIiLCJvZmZsaW5lX2FjY2VzcyIsImFkbWluIiwidW1hX2F1dGhvcml6YXRpb24iXX0sInJlc291cmNlX2FjY2VzcyI6eyJtYXN0ZXItcmVhbG0iOnsicm9sZXMiOlsidmlldy1yZWFsbSIsInZpZXctaWRlbnRpdHktcHJvdmlkZXJzIiwibWFuYWdlLWlkZW50aXR5LXByb3ZpZGVycyIsImltcGVyc29uYXRpb24iLCJjcmVhdGUtY2xpZW50IiwibWFuYWdlLXVzZXJzIiwicXVlcnktcmVhbG1zIiwidmlldy1hdXRob3JpemF0aW9uIiwicXVlcnktY2xpZW50cyIsInF1ZXJ5LXVzZXJzIiwibWFuYWdlLWV2ZW50cyIsIm1hbmFnZS1yZWFsbSIsInZpZXctZXZlbnRzIiwidmlldy11c2VycyIsInZpZXctY2xpZW50cyIsIm1hbmFnZS1hdXRob3JpemF0aW9uIiwibWFuYWdlLWNsaWVudHMiLCJxdWVyeS1ncm91cHMiXX0sImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoicHJvZmlsZSBlbWFpbCIsInNpZCI6ImQyYjlkMzFjLWU3ZmMtNDI4OS1hOThjLTI2ZTMwMDBiMzVkMiIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJodHRwczovL2hhc3VyYS5pby9qd3QvY2xhaW1zIjp7IngtaGFzdXJhLXVzZXItaWQiOiIwYmJmMmY2Mi1hNGYxLTRmZTUtYWFkOC1lYzFjYjc0ODU5MzkiLCJ4LWhhc3VyYS1kZWZhdWx0LXJvbGUiOiJ1c2VyIiwieC1oYXN1cmEtYWxsb3dlZC1yb2xlcyI6WyJ1c2VyIl0sIngtaGFzdXJhLXVzZXItZ3JvdXBzIjoie30ifSwicHJlZmVycmVkX3VzZXJuYW1lIjoiam9obi5oYW5zYXJpY2tAd2FsbGFyb28uYWkiLCJlbWFpbCI6ImpvaG4uaGFuc2FyaWNrQHdhbGxhcm9vLmFpIn0.Gnig3PdpMFGSrQ2J4Tj3Nqbk2UOfBCH4MEw2i6p5pLkQ51F8FM7Dq-VOGoNYAXZn2OXw_bKh0Ae60IqglB0PSFTlksVzb1uSGKOPgcZNkI0fTMK99YW71UctMDk9MYrN09bT2GhGQ7FV-tJNqemYSXB3eMIaTkah6AMUfJIYYvf6J2OqXyNJqc6Hwf0-44FGso_N0WXF6GM-ww72ampVjc10Mad30kYzQX508U9RuZXd3uvOrRQHreOcPPmjso1yDbUx8gqLeov_uq3dg5hUY55v2oVBdtXT60-ZBIQP8uETNetv6529Nm52uwKNT7DdjXk85kbJBK8oV6etyfKRDw'
if arrowEnabled is True:
    dataFile="./data/data_25k.df.json"
    contentType="application/json; format=pandas-records"
else:
    dataFile="./data/data_25k.json"
    contentType="application/json"
!curl -X POST {external_url} -H "Content-Type:{contentType}" -H "Authorization: Bearer {token}" --data @{dataFile} > curl_response.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.0M  100 10.1M  100 2886k  2322k   642k  0:00:04  0:00:04 --:--:-- 2965k

Undeploy Pipeline

When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks. Note that if the deployment variable is unchanged pipeline.deploy() will restart the inference engine in the same configuration as before.

pipeline.undeploy()

2.1.3 - Wallaroo SDK Azure Databricks Install Guide

How to install the Wallaroo SDK in Azure Databricks

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Installing the Wallaroo SDK into Workspace

Organizations that use Azure Databricks for model training and development can deploy models to Wallaroo through the Wallaroo SDK. The following guide is created to assist users with installing the Wallaroo SDK, setting up authentication through Azure Databricks, and making a standard connection to a Wallaroo instance through Azure Databricks Workspace.

These instructions are based on the on the Wallaroo SSO for Microsoft Azure and the Connect to Wallaroo guides.

This tutorial provides the following:

  • ccfraud.onnx: A pretrained model from the Machine Learning Group’s demonstration on Credit Card Fraud detection.
  • Sample inference test data:
    • ccfraud_high_fraud.json: Test input file that returns a high likelihood of credit card fraud.
    • ccfraud_smoke_test.json: Test input file that returns a low likelihood of credit card fraud.
    • cc_data_1k.json: Sample input file with 1,000 records.
    • cc_data_10k.json: Sample input file with 10,000 records.

To use the Wallaroo SDK within Azure Databricks Workspace, a virtual environment will be used. This will set the necessary libraries and specific Python version required.

Prerequisites

The following is required for this tutorial:

General Steps

For our example, we will perform the following:

  • Wallaroo SDK Install
    • Install the Wallaroo SDK into the Azure Databricks cluster.
    • Install the Wallaroo Python SDK.
    • Connect to a remote Wallaroo instance. This instance is configured to use the standard Keycloak service.
  • Wallaroo SDK from Azure Databricks Workspace (Optional)
    • The following steps are used to demonstrate using the Wallaroo SDK in an Azure Databricks Workspace environment. The entire tutorial can be found on the Wallaroo Tutorials repository.
      • Create a workspace for our work.
      • Upload the CCFraud model.
      • Create a pipeline that can ingest our submitted data, submit it to the model, and export the results
      • Run a sample inference through our pipeline by loading a file
      • Undeploy the pipeline and return resources back to the Wallaroo instance’s Kubernetes environment.

Install Wallaroo SDK

Add Wallaroo SDK to Cluster

To install the Wallaroo SDK in a Azure Databricks environment:

  1. From the Azure Databricks dashboard, select Computer, then the cluster to use.
  2. Select Libraries.
  3. Select Install new.
  4. Select PyPI. In the Package field, enter the current version of the Wallaroo SDK. It is recommended to specify the version, which as of this writing is wallaroo==2023.1.0.
  5. Select Install.

Once the Status shows Installed, it will be available in Azure Databricks notebooks and other tools that use the cluster.

Add Tutorial Files

The following instructions can be used to upload this tutorial and it’s files into Databricks. Depending on how your Azure Databricks is configured and your organizations standards, there are multiple ways of uploading files to your Azure Databricks environment. The following example is used for the tutorial and makes it easy to reference data files from within this Notebook. Adjust based on your requirements.

  • IMPORTANT NOTE: Importing a repo from a Git repository may not convert the included Jupyter Notebooks into the Databricks format. This method
  1. From the Azure Databricks dashboard, select Repos.

  2. Select where to place the repo, then select Add Repo.

  3. Set the following:

    1. Create repo by cloning a Git repository: Uncheck
    2. Repository name: Set any name based on the Databricks standard (no spaces, etc).
    3. Select Create Repo.
  4. Select the new tutorial, then from the repo menu dropdown, select Import.

  5. Select the files to upload. For this example, the following files are uploaded:

    1. ccfraud.onnx: A pretrained model from the Machine Learning Group’s demonstration on Credit Card Fraud detection.
    2. Sample inference test data:
      1. ccfraud_high_fraud.json: Test input file that returns a high likelihood of credit card fraud.
      2. ccfraud_smoke_test.json: Test input file that returns a low likelihood of credit card fraud.
      3. cc_data_1k.json: Sample input file with 1,000 records.
      4. cc_data_10k.json: Sample input file with 10,000 records.
    3. install-wallaroo-sdk-databricks-azure-guide.ipynb: This notebook.
  6. Select Import.

The Jupyter Notebook can be opened from this new Azure Databricks repository, and relative files it references will be accessible with the exceptions listed below.

Zip files added via the method above are automatically decompressed, so can not be used as model files. For example, tensor based models such as the Wallaroo Aloha Demo. Zip files can be uploaded using DBFS and used through the following process:

To upload model files to Azure Databricks using DBFS:

  1. From the Azure Databricks dashboard, select Data.

  2. Select Add->Add data.

  3. Select DBFS.

  4. Select Upload File and enter the following:

    1. DBFS Target Directory (Optional): Optional step: Set the directory where the files will be uploaded.
  5. Select the files to upload. Note that each file will be given a location and they can be access with /dbfs/PATH. For example, the file alohacnnlstm.zip uploaded to the directory aloha would be referenced with `/dbfs/FileStore/tables/aloha/alohacnnlstm.zip

Sample Wallaroo Connection

With the Wallaroo Python SDK installed, remote commands and inferences can be performed through the following steps.

Open a Connection to Wallaroo

The first step is to connect to Wallaroo through the Wallaroo client.

This is accomplished using the wallaroo.Client(api_endpoint, auth_endpoint, auth_type command) command that connects to the Wallaroo instance services.

The Client method takes the following parameters:

  • api_endpoint (String): The URL to the Wallaroo instance API service.
  • auth_endpoint (String): The URL to the Wallaroo instance Keycloak service.
  • auth_type command (String): The authorization type. In this case, SSO.

The URLs are based on the Wallaroo Prefix and Wallaroo Suffix for the Wallaroo instance. For more information, see the DNS Integration Guide. In the example below, replace “YOUR PREFIX” and “YOUR SUFFIX” with the Wallaroo Prefix and Suffix, respectively. In the example below, replace “YOUR PREFIX” and “YOUR SUFFIX” with the Wallaroo Prefix and Suffix, respectively.

Once run, the wallaroo.Client command provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions.

Depending on the configuration of the Wallaroo instance, the user will either be presented with a login request to the Wallaroo instance or be authenticated through a broker such as Google, Github, etc. To use the broker, select it from the list under the username/password login forms. For more information on Wallaroo authentication configurations, see the Wallaroo Authentication Configuration Guides.

Wallaroo Login

Once authenticated, the user will verify adding the device the user is establishing the connection from. Once both steps are complete, then the connection is granted.

Device Registration

The connection is stored in the variable wl for use in all other Wallaroo calls.

Replace YOUR PREFIX and YOUR SUFFIX with the DNS prefix and suffix for the Wallaroo instance. For more information, see the DNS Integration Guide.

import wallaroo
from wallaroo.object import EntityNotFoundError
# SSO login through keycloak

wallarooPrefix = "YOUR PREFIX"
wallarooSuffix = "YOUR SUFFIX"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
                    auth_type="sso")

Create the Workspace

We will create a workspace to work in and call it the databricksazuresdkworkspace, then set it as current workspace environment. We’ll also create our pipeline in advance as databricksazuresdkpipeline.

  • IMPORTANT NOTE: For this example, the CCFraud model is stored in the file ccfraud.onnx and is referenced from a relative link. For platforms such as Databricks, the files may need to be in a universal file format. For those, the example file location below may be:

model_file_name = '/dbfs/FileStore/tables/aloha/alohacnnlstm.zip

Adjust file names and locations based on your requirements.

workspace_name = 'databricksazuresdkworkspace'
pipeline_name = 'databricksazuresdkpipeline'
model_name = 'ccfraudmodel'
model_file_name = './ccfraud.onnx'
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name databricksazuresdkpipeline
created 2023-02-07 15:55:40.574745+00:00
last_updated 2023-02-07 15:55:40.574745+00:00
deployed (none)
tags
versions ca6d6ea6-2e45-4795-9253-5c40e8483dc9
steps

We can verify the workspace is created the current default workspace with the get_current_workspace() command.

wl.get_current_workspace()
Out[6]: {'name': 'databricksazuresdkworkspace', 'id': 8, 'archived': False, 'created_by': '3547815c-b48d-4e69-bfbd-fff9d525c5d7', 'created_at': '2023-02-07T15:55:39.548497+00:00', 'models': [], 'pipelines': [{'name': 'databricksazuresdkpipeline', 'create_time': datetime.datetime(2023, 2, 7, 15, 55, 40, 574745, tzinfo=tzutc()), 'definition': '[]'}]}

Upload the Models

Now we will upload our model.

IMPORTANT NOTE: If using DBFS, use the file path format such as /dbfs/FileStore/shared_uploads/YOURWORKSPACE/file format rather than the dbfs: format.

model = wl.upload_model(model_name, model_file_name).configure()
model
Out[15]: {'name': 'ccfraudmodel', 'version': 'ccb488dd-36ed-4aaf-99cf-9a16bd3654db', 'file_name': 'ccfraud.onnx', 'image_path': None, 'last_update_time': datetime.datetime(2023, 2, 7, 16, 1, 0, 303545, tzinfo=tzutc())}

Deploy a Model

Now that we have a model that we want to use we will create a deployment for it.

To do this, we’ll create our pipeline that can ingest the data, pass the data to our CCFraud model, and give us a final output. We’ll call our pipeline databricksazuresdkpipeline, then deploy it so it’s ready to receive data. The deployment process usually takes about 45 seconds.

pipeline.add_model_step(model)
name databricksazuresdkpipeline
created 2023-02-07 15:55:40.574745+00:00
last_updated 2023-02-07 15:57:24.803281+00:00
deployed True
tags
versions 971f9db7-1b73-4e72-8cdb-cfa2d5a9ddd7, 6c1028c4-3ca7-47b0-b3a6-834d12b57fc9, ca6d6ea6-2e45-4795-9253-5c40e8483dc9
steps ccfraudmodel
pipeline.deploy()
name databricksazuresdkpipeline
created 2023-02-07 15:55:40.574745+00:00
last_updated 2023-02-07 16:04:35.891487+00:00
deployed True
tags
versions 091eed0f-8984-4753-9316-0fbbf68bb398, bb701da9-440b-4ce6-8b92-36446347e85c, 971f9db7-1b73-4e72-8cdb-cfa2d5a9ddd7, 6c1028c4-3ca7-47b0-b3a6-834d12b57fc9, ca6d6ea6-2e45-4795-9253-5c40e8483dc9
steps ccfraudmodel

We can verify that the pipeline is running and list what models are associated with it.

pipeline.status()
Out[23]: {'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.2.34',
   'name': 'engine-754b5c457d-5c4pc',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'databricksazuresdkpipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'ccfraudmodel',
      'version': 'ccb488dd-36ed-4aaf-99cf-9a16bd3654db',
      'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.0.29',
   'name': 'engine-lb-74b4969486-mslkt',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Interferences

Infer 1 row

Now that the pipeline is deployed and our CCfraud model is in place, we’ll perform a smoke test to verify the pipeline is up and running properly. We’ll use the infer_from_file command to load a single transaction and determine if it is flagged for fraud. If it returns correctly, a small value should be returned indicating a low likelihood that the transaction was fraudulent.

result = pipeline.infer_from_file("./ccfraud_smoke_test.json")
result[0].data()
Out[25]: [array([[0.00149742]])]

Batch Inference

Now that our smoke test is successful, let’s really give it some data. We’ll use the cc_data_1k.json file that contains 1,000 inferences to be performed.

result = pipeline.infer_from_file("./cc_data_1k.json")
result
Out[26]: [InferenceResult({'check_failures': [],
  'elapsed': 245003,
  'model_name': 'ccfraudmodel',
  'model_version': 'ccb488dd-36ed-4aaf-99cf-9a16bd3654db',
  'original_data': {'tensor': [[-1.060329750089797,
                                2.354496709462385,
                                -3.563878832646437,
                                5.138734892618555,
                                -1.23084570186641,
                                -0.7687824607744093,
                                -3.588122810891446,
                                1.888083766259287,
                                -3.2789674273886593,
                                -3.956325455353324,
                                4.099343911805088,
                                -5.653917639476211,
                                -0.8775733373342495,
                                -9.131571191990632,
                                -0.6093537872620682,
                                -3.748027677256424,
                                -5.030912501659983,
                                -0.8748149525506821,
                                1.9870535692026476,
                                0.7005485718467245,
                                0.9204422758154284,
                                -0.10414918089758483,
                                0.3229564351284999,
                                -0.7418141656910608,
                                0.03841201586730117,
                                1.099343914614657,
                                1.2603409755785089,
                                -0.14662447391576958,
                                -1.446321243938815],
                               [-1.060329750089797,
                                2.354496709462385,
                                -3.563878832646437,
                                5.138734892618555,
                                -1.23084570186641,
                                -0.7687824607744093,
                                -3.588122810891446,
                                1.888083766259287,
                                -3.2789674273886593,
                                -3.956325455353324,
                                4.099343911805088,
                                -5.653917639476211,
                                -0.8775733373342495,
                                -9.131571191990632,
                                -0.6093537872620682,
                                -3.748027677256424,
                                -5.030912501659983,
                                -0.8748149525506821,
                                1.9870535692026476,
                                0.7005485718467245,
                                0.9204422758154284,
                                -0.10414918089758483,
                                0.3229564351284999,
                                -0.7418141656910608,
                                0.03841201586730117,
                                1.099343914614657,
                                1.2603409755785089,
                                -0.14662447391576958,
                                -1.446321243938815],
                               [-1.060329750089797,
                                2.354496709462385,
                                -3.563878832646437,
                                5.138734892618555,
                                -1.23084570186641,
                                -0.7687824607744093,
                                -3.588122810891446,
                                1.888083766259287,
                                -3.2789674273886593,
                                -3.956325455353324,
                                4.099343911805088,
                                -5.653917639476211,
                                -0.8775733373342495,
                                -9.131571191990632,
                                -0.6093537872620682,
                                -3.748027677256424,
                                -5.030912501659983,
                                -0.8748149525506821,
                                1.9870535692026476,
                                0.7005485718467245,
                                0.9204422758154284,
                                -0.10414918089758483,
                                0.3229564351284999,
                                -0.7418141656910608,
                                0.03841201586730117,
                                1.099343914614657,
                                1.2603409755785089,
                                -0.14662447391576958,
                                -1.446321243938815],
                               [-1.060329750089797,
                                2.354496709462385,
                                -3.563878832646437,
                                5.138734892618555,
                                -1.23084570186641,
                                -0.7687824607744093,
                                -3.588122810891446,
                                1.888083766259287,
                                -3.2789674273886593,
                                -3.956325455353324,
                                4.099343911805088,
                                -5.653917639476211,
                                -0.8775733373342495,
                                -9.131571191990632,
                                -0.6093537872620682,
                                -3.748027677256424,
                                -5.030912501659983,
                                -0.8748149525506821,
                                1.9870535692026476,
                                0.7005485718467245,
                                0.9204422758154284,
                                -0.10414918089758483,
                                0.3229564351284999,
                                -0.7418141656910608,
                                0.03841201586730117,
                                1.099343914614657,
                                1.2603409755785089,
                                -0.14662447391576958,
                                -1.446321243938815],
                               [0.5817662107606553,
                                0.0978815509566172,
                                0.1546819423995403,
                                0.475410194903404,
                                -0.1978862305998003,
                                -0.45043448542395703,
                                0.016654044671806197,
                                -0.025607055099995037,
                                0.09205616023555586,
                                -0.27839171528517387,
                                0.059329944112281194,
                                -0.019658541640589822,
                                -0.4225083156745137,
                                -0.12175388766841427,
                                1.547309489412488,
                                0.23916228635697,
                                0.35539748808055915,
                                -0.7685165300981693,
                                -0.7000849354838512,
                                -0.11900432852127547,
                                -0.3450517133266211,
                                -1.1065114107709193,
                                0.2523411195349452,
                                0.02094418256934876,
                                0.2199267436399366,
                                0.2540689265485751,
                                -0.04502250942505252,
                                0.1086773897916229,
                                0.2547179311087416],
                               [-0.7621273681123774,
                                0.8854701414345362,
                                0.5235808652087769,
                                -0.8139743550578189,
                                0.3793240543966917,
                                0.15606533645358955,
                                0.545129966459155,
                                0.07859272424715734,
                                0.41439685426159006,
                                0.49140523482948895,
                                0.07743910220902032,
                                1.050105025966046,
                                0.9901440216912372,
                                -0.614248313100663,
                                -1.5260740653027238,
                                0.2053324702711796,
                                -1.0185637854071916,
                                0.04909869191405787,
                                0.6964184879033418,
                                0.5948331721915132,
                                -0.3934362921711871,
                                -0.5922492660097428,
                                -0.3953093077108832,
                                -1.331042702500481,
                                0.6287441286760012,
                                0.8665525995997287,
                                0.7974673604471482,
                                1.1174342262023085,
                                -0.6700716550561031],
                               [-0.2836830106617754,
                                0.2281341607542476,
                                1.0358808684971377,
                                1.031141364744695,
                                0.6485053916657638,
                                0.6993338999916012,
                                0.1827667194489511,
                                0.09897462120147606,
                                -0.573448773318372,
                                0.5928927597000144,
                                0.3085637362189933,
                                0.15338699178269907,
                                -0.3628347922840285,
                                -0.28650544988763965,
                                -1.138044653648458,
                                -0.22071176013852775,
                                -0.12060339309501608,
                                -0.23252469358947547,
                                0.8675179232286943,
                                -0.00813230344349814,
                                -0.015330414985472576,
                                0.41692378222119375,
                                -0.42490253139063966,
                                -0.983445197690985,
                                -1.117590357786289,
                                2.107670188520057,
                                -0.33619500725255724,
                                -0.3469573431212065,
                                0.019307669007054214],
                               [1.037963634604398,
                                -0.15298730197183308,
                                -1.0912561861755297,
                                -0.003333982808610693,
                                0.48042818357577816,
                                0.11207084748490805,
                                0.023315770873913674,
                                0.0009213037997834434,
                                0.4021730182105383,
                                0.2120753711962651,
                                -0.14628042225168944,
                                0.44244770274013223,
                                -0.4641602116945049,
                                0.49842564302053766,
                                -0.8230270969280085,
                                0.3168388183929484,
                                -0.905044097738204,
                                0.07103650391675659,
                                1.1111388586922986,
                                -0.2157914053975094,
                                -0.37375912900543384,
                                -1.033007534671374,
                                0.31447209128965764,
                                -0.5109243112374892,
                                -0.16859104983418324,
                                0.5918324405536384,
                                -0.22317928245806465,
                                -0.22871533772536015,
                                -0.0868944761624121],
                               [0.15172836621737265,
                                0.6589966337195882,
                                -0.33237136470392026,
                                0.7285871978728441,
                                0.6430271572675802,
                                -0.036105130607259,
                                0.22015305036081068,
                                -1.4928731939082054,
                                -0.5895806486715522,
                                0.22272511026018857,
                                0.4443729713208923,
                                0.8411555815062762,
                                -0.24129130201177532,
                                0.8986828750053317,
                                -0.9866307095643508,
                                -0.891930176747572,
                                -0.08788759139559761,
                                0.11633324608127409,
                                1.1469566645633804,
                                -0.5417470436307007,
                                2.232136056300802,
                                -0.16792713816415766,
                                -0.8071223667464775,
                                -0.6379226787209245,
                                1.9121889390871136,
                                -0.5565545168737087,
                                0.6528273963811771,
                                0.8163897965987713,
                                -0.22816150171105992],
                               [-0.16831002464168482,
                                0.7070470316726095,
                                0.18752349479594543,
                                -0.3885406952480356,
                                0.8190382136893654,
                                -0.2264929889455448,
                                0.920446915383558,
                                -0.1362740973549585,
                                -0.3336344399134833,
                                -0.31742816858138206,
                                1.190347893355806,
                                0.17742920974706458,
                                -0.5681631428570322,
                                -0.8907063934925815,
                                -0.5603225648833638,
                                0.08978317373468075,
                                0.41875259056737263,
                                0.34062690461012146,
                                0.7358794384123696,
                                0.2162316926274178,
                                -0.4090832914654094,
                                -0.873608946074589,
                                -0.11287065093605424,
                                1.0027861773717552,
                                -0.940491615382638,
                                0.34471446407049355,
                                0.09082338670023896,
                                0.03385948858451272,
                                -1.5295522680268],
                               [0.6066235673660867,
                                0.06318393046103796,
                                -0.08029619730834595,
                                0.6955262344665573,
                                -0.1775255858536255,
                                -0.37571582613170335,
                                -0.10034783809984708,
                                -0.002020697400621504,
                                0.6859442462445478,
                                -0.6582840559236135,
                                -0.9995187665924608,
                                -0.5340094457850662,
                                -1.1303344301902345,
                                -1.4048093394603511,
                                -0.09533161186902651,
                                0.34286507076318934,
                                1.137627771131194,
                                0.42483092016552,
                                0.23163849625535257,
                                -0.11453707463184153,
                                -0.30158635696358,
                                -0.6731341245200443,
                                -0.2723217481414279,
                                -0.392522783076639,
                                1.1115261431276475,
                                0.9205381913240704,
                                -0.028059000408212655,
                                0.13116439016892018,
                                0.2152022580020345],
                               [0.6022605285983497,
                                0.03354188522587924,
                                0.07384927695250888,
                                0.18511785364463623,
                                -0.305485553894443,
                                -0.7940218336809065,
                                0.16549419059256967,
                                -0.13036002461367513,
                                -0.18841586940040084,
                                0.06659757810555761,
                                1.4810974231280167,
                                0.6472122044773744,
                                -0.6703196483832992,
                                0.7565686747307261,
                                0.2134058731218033,
                                0.15054757512303818,
                                -0.4312378588876496,
                                -0.01829519245300039,
                                0.2851995511280944,
                                -0.10765090263665966,
                                0.006824282636551462,
                                -0.10765890483072864,
                                -0.0788026490786185,
                                0.9475328124756416,
                                0.8413388083261754,
                                1.1769860739049118,
                                -0.20262122059889132,
                                -0.0006311264993808188,
                                0.18515595494858325],
                               [-1.2004162236340663,
                                -0.02934247149289781,
                                0.6002673902810236,
                                -1.0581165763998934,
                                0.8618826503029525,
                                0.9173564431626324,
                                0.07531515110044265,
                                0.22061892248030848,
                                1.218873509137122,
                                -0.3886523829726902,
                                -0.6095125829994053,
                                0.19650432666838064,
                                -0.2661495951765694,
                                -0.6379133677491714,
                                0.48339834201800247,
                                -0.4985531206523148,
                                -0.30642432885045834,
                                -1.452449679301684,
                                -3.114069963143443,
                                -1.0750208205893026,
                                0.33412238420877444,
                                1.5687760942001978,
                                -0.520167136432032,
                                -0.5317761577207334,
                                -0.383294946943516,
                                -0.9846864506812528,
                                -2.8976275684335313,
                                -0.5073512289684565,
                                -0.3252693380620513],
                               [-2.842735703124214,
                                2.8260142810969406,
                                -1.595334491992825,
                                -0.2991672885943705,
                                -1.5495220405376615,
                                1.6401772163256094,
                                -3.282195184902111,
                                0.4863028450594385,
                                0.35768012762513235,
                                0.32223721627031443,
                                0.2710846268609854,
                                2.0025589607976957,
                                -1.2151492104208754,
                                2.2835338743639055,
                                -0.4148465662141878,
                                -0.6786230740923882,
                                2.6106103197644366,
                                -1.6468850705007771,
                                -1.608012375610504,
                                0.99954742225582,
                                -0.4665201752903192,
                                0.8316050291541919,
                                1.2855678736315532,
                                -2.3561879047775687,
                                0.21079384022245212,
                                1.1256706306463826,
                                1.1538189945359587,
                                1.0332061029880848,
                                -1.4715524921303922],
                               [0.37025168630329713,
                                -0.559565196966135,
                                0.6757255982084903,
                                0.6920586122163292,
                                -1.097595083688384,
                                -0.3864326068450856,
                                -0.24648264110773657,
                                -0.030465499323007045,
                                0.6888634287660991,
                                -0.25688240219544295,
                                -0.31262126741537455,
                                0.4177154048654652,
                                0.08658272648758861,
                                -0.23586536095779972,
                                0.7111355283012393,
                                0.2937196011516156,
                                -0.21506657714816485,
                                -0.03799982400637199,
                                -0.5299304856390635,
                                0.4921897195724529,
                                0.3506791137140131,
                                0.4934112481922948,
                                -0.36453056821705304,
                                1.3046786490119904,
                                0.38837918082008666,
                                1.0901335639291738,
                                -0.0981875890062536,
                                0.22211438851412624,
                                1.2586007559041816],
                               [1.0740600994534184,
                                -0.5004444122961789,
                                -0.6665104458726077,
                                -0.513795450608903,
                                -0.22180251404665496,
                                0.17340111491593263,
                                -0.6004987142090168,
                                0.010577019331547527,
                                -0.4816997216919937,
                                0.8861570874243399,
                                0.18529769780712504,
                                0.8741913982235725,
                                1.1965272846208048,
                                -0.10993396869909432,
                                -0.6057668522157109,
                                -1.1254346438176983,
                                -0.7830480707969095,
                                1.9148497747436344,
                                -0.3837603969088797,
                                -0.6387752587185815,
                                -0.4853295482654116,
                                -0.5961116065703739,
                                0.4123371083341403,
                                0.17603697938449023,
                                -0.5173803145442223,
                                1.1181808796610917,
                                -0.0934318754864336,
                                -0.1756922307137465,
                                -0.2551430327198796],
                               [-0.3389501953944846,
                                0.4600633972547716,
                                1.5422134202684443,
                                0.026738992407616496,
                                0.11589317308681447,
                                0.5045446890411369,
                                0.05163626851385762,
                                0.26452863620

*** WARNING: max output size exceeded, skipping output. ***

                         0.0008339285850524902,
                                  0.0006878077983856201,
                                  0.001112222671508789,
                                  0.0005952417850494385,
                                  0.0003427863121032715,
                                  0.0006614029407501221,
                                  0.001322627067565918,
                                  0.0005146563053131104,
                                  4.824995994567871e-05,
                                  0.00046452879905700684,
                                  0.0003368556499481201,
                                  0.0012190043926239014,
                                  0.00046455860137939453,
                                  0.0009738504886627197,
                                  0.00035002827644348145,
                                  0.00039589405059814453,
                                  0.000307619571685791,
                                  0.0005711615085601807,
                                  0.0005376338958740234,
                                  0.0001920461654663086,
                                  0.0009895861148834229,
                                  0.0007052123546600342,
                                  0.0005137920379638672,
                                  0.00035962462425231934,
                                  0.0007860660552978516,
                                  0.000491708517074585,
                                  7.635354995727539e-05,
                                  0.00026789307594299316,
                                  0.0019146502017974854,
                                  0.0006752610206604004,
                                  0.0008069276809692383,
                                  0.0004373788833618164,
                                  0.0007348060607910156,
                                  0.00010257959365844727,
                                  0.0003650486469268799,
                                  0.001430898904800415,
                                  0.0011163949966430664,
                                  0.0005064606666564941,
                                  0.0006780624389648438,
                                  0.0007084012031555176,
                                  0.0005066394805908203,
                                  0.0005592107772827148,
                                  0.0007954835891723633,
                                  0.000926285982131958,
                                  0.0006126761436462402,
                                  0.0003502964973449707,
                                  0.000958859920501709,
                                  0.0002881288528442383,
                                  0.00016897916793823242,
                                  0.0006831586360931396,
                                  0.0003865659236907959,
                                  0.00016203522682189941,
                                  0.0008713304996490479,
                                  0.0004932284355163574,
                                  0.0004909336566925049,
                                  0.00022536516189575195,
                                  0.0009913146495819092,
                                  0.0002721548080444336,
                                  8.744001388549805e-05,
                                  0.0006993114948272705,
                                  0.0010588765144348145,
                                  0.0009733438491821289,
                                  0.0006800591945648193,
                                  0.0002625584602355957,
                                  0.0006255805492401123,
                                  0.00024187564849853516,
                                  0.0002522468566894531,
                                  0.0008753836154937744,
                                  0.0002613067626953125,
                                  0.0005331039428710938,
                                  0.0002490878105163574,
                                  0.0001704394817352295,
                                  0.00031509995460510254,
                                  0.0015914440155029297,
                                  0.00025537610054016113,
                                  8.07344913482666e-05,
                                  0.0008647739887237549,
                                  0.0004987716674804688,
                                  0.001710742712020874,
                                  0.0013418197631835938,
                                  0.00037536025047302246,
                                  0.0003878176212310791,
                                  0.0005452334880828857,
                                  0.0007519721984863281,
                                  0.0008081197738647461,
                                  0.000502467155456543,
                                  0.0003039240837097168,
                                  0.0005827546119689941,
                                  0.0006529092788696289,
                                  0.0010212063789367676,
                                  0.00034746527671813965,
                                  0.0008154213428497314,
                                  0.00038063526153564453,
                                  0.0005306899547576904,
                                  0.00025406479835510254,
                                  0.00018146634101867676,
                                  0.0013905465602874756,
                                  0.0006494820117950439,
                                  0.0006037354469299316,
                                  0.0014120042324066162,
                                  0.00041112303733825684,
                                  0.00040650367736816406,
                                  0.0005333423614501953,
                                  0.0007215738296508789,
                                  0.0001367330551147461,
                                  0.0003502070903778076,
                                  0.0009997785091400146,
                                  0.0008716285228729248,
                                  0.0005594789981842041,
                                  0.000410228967666626,
                                  0.0001429915428161621,
                                  0.0003579556941986084,
                                  0.0011880695819854736,
                                  0.0003827214241027832,
                                  0.0012142062187194824,
                                  0.0005961358547210693,
                                  0.000471651554107666,
                                  0.0006967782974243164,
                                  0.00037926435470581055,
                                  0.0003273487091064453,
                                  0.0016745328903198242,
                                  0.0003102719783782959,
                                  0.0010521411895751953,
                                  3.841519355773926e-05,
                                  0.0004825592041015625,
                                  0.0009035468101501465,
                                  0.0009154081344604492,
                                  0.0009016096591949463,
                                  0.0011216700077056885,
                                  0.0002802610397338867,
                                  0.0007374584674835205,
                                  0.0005075931549072266,
                                  0.0006051957607269287,
                                  0.0005790889263153076,
                                  0.00032085180282592773,
                                  0.00042501091957092285,
                                  0.0007457137107849121,
                                  0.0006720125675201416,
                                  0.0003052949905395508,
                                  0.0006992816925048828,
                                  0.0003927946090698242,
                                  0.00024440884590148926,
                                  0.0001997053623199463,
                                  0.0002860724925994873,
                                  0.000585019588470459,
                                  0.00021448731422424316,
                                  0.000881195068359375,
                                  0.0004405081272125244,
                                  0.0008642077445983887,
                                  0.0005924403667449951,
                                  0.0007340312004089355,
                                  0.0004509389400482178,
                                  0.0008679628372192383,
                                  0.00037926435470581055,
                                  0.0008240938186645508,
                                  0.0007452666759490967,
                                  0.00033849477767944336,
                                  0.0011382997035980225,
                                  0.0003623068332672119,
                                  0.0002282559871673584,
                                  0.0005411803722381592,
                                  0.001323312520980835,
                                  0.0009799599647521973,
                                  0.0008512735366821289,
                                  0.0007756352424621582,
                                  0.0003809928894042969,
                                  0.00017562508583068848,
                                  0.0005088448524475098,
                                  0.00014969706535339355,
                                  9.685754776000977e-05,
                                  0.0016102492809295654,
                                  0.0003826320171356201,
                                  0.0013871490955352783,
                                  0.00020483136177062988,
                                  0.0011193156242370605,
                                  0.0008026957511901855,
                                  0.00047454237937927246,
                                  0.0005080103874206543,
                                  0.0012269020080566406,
                                  0.00022527575492858887,
                                  0.00020378828048706055,
                                  0.0004162788391113281,
                                  0.0008330047130584717,
                                  2.4050474166870117e-05,
                                  0.0006586611270904541,
                                  0.000383526086807251,
                                  0.00040608644485473633,
                                  0.00040709972381591797,
                                  0.00020489096641540527,
                                  0.0006171464920043945,
                                  0.0012582242488861084,
                                  0.0004496574401855469,
                                  0.0005507469177246094,
                                  0.0008178949356079102,
                                  0.001517951488494873,
                                  0.00017982721328735352,
                                  0.000568687915802002,
                                  0.001766800880432129,
                                  0.0002658367156982422,
                                  0.000822216272354126,
                                  0.0004229545593261719,
                                  0.00025528669357299805,
                                  0.0004892349243164062,
                                  0.000771939754486084,
                                  0.0010519325733184814,
                                  0.0010221898555755615,
                                  9.08970832824707e-05,
                                  0.0008391737937927246,
                                  0.00022780895233154297,
                                  0.0007468760013580322,
                                  0.0007697641849517822,
                                  0.0019667446613311768,
                                  0.0012534558773040771,
                                  0.0001010596752166748,
                                  0.0005205869674682617,
                                  0.0002041459083557129,
                                  0.0006001889705657959,
                                  0.0009807944297790527,
                                  0.000767141580581665,
                                  0.00038120150566101074,
                                  0.0002471506595611572,
                                  0.00038233399391174316,
                                  0.00037872791290283203,
                                  0.0007638931274414062,
                                  0.00029391050338745117,
                                  0.0008871853351593018,
                                  0.0004890561103820801,
                                  0.0015825629234313965,
                                  0.0005756914615631104,
                                  0.0003350973129272461,
                                  0.00026857852935791016,
                                  0.0010086894035339355,
                                  0.00048220157623291016,
                                  0.00024381279945373535,
                                  6.434321403503418e-05,
                                  2.0682811737060547e-05,
                                  0.0003471970558166504,
                                  0.00022557377815246582,
                                  0.0002627372741699219,
                                  0.0003419220447540283,
                                  0.000281602144241333,
                                  0.0012967884540557861,
                                  0.0011523962020874023,
                                  0.0004177987575531006,
                                  0.0010204315185546875,
                                  0.0010258853435516357,
                                  0.0011347532272338867,
                                  0.00038436055183410645,
                                  0.0009618997573852539,
                                  0.00035199522972106934,
                                  0.000282973051071167,
                                  0.00024309754371643066,
                                  0.0001265406608581543,
                                  6.946921348571777e-05,
                                  0.00015616416931152344,
                                  0.0014993548393249512,
                                  0.0006575882434844971,
                                  0.0003606081008911133,
                                  0.0023556947708129883,
                                  0.0007058978080749512,
                                  0.0014238357543945312,
                                  0.0007699429988861084,
                                  0.0008679032325744629,
                                  0.00018492341041564941,
                                  0.0007839500904083252,
                                  0.0009354352951049805,
                                  0.00027167797088623047,
                                  0.0009218454360961914,
                                  0.00035691261291503906,
                                  0.0005003809928894043,
                                  0.0004172325134277344,
                                  0.0011021196842193604,
                                  0.0010276734828948975,
                                  0.0006104707717895508,
                                  0.00045561790466308594,
                                  0.0006892085075378418,
                                  0.0004885494709014893,
                                  0.0004724562168121338,
                                  0.001522064208984375,
                                  0.0005326271057128906,
                                  0.00010651350021362305,
                                  0.0002598762512207031,
                                  0.0013784170150756836,
                                  0.0004596710205078125,
                                  0.0003192126750946045,
                                  0.0009370148181915283,
                                  0.0006310641765594482,
                                  0.0005830228328704834,
                                  0.00036329030990600586,
                                  0.0009173750877380371,
                                  0.0006718039512634277,
                                  3.796815872192383e-05,
                                  0.00077781081199646,
                                  0.00033274292945861816,
                                  0.0001729726791381836,
                                  0.0008949339389801025,
                                  0.00026357173919677734,
                                  0.000757366418838501,
                                  0.0007928907871246338,
                                  0.0012267529964447021,
                                  0.0013829469680786133,
                                  0.0005187392234802246,
                                  0.0003561079502105713,
                                  0.000646054744720459,
                                  0.001015990972518921,
                                  0.0015155971050262451,
                                  0.0002993941307067871,
                                  0.00013318657875061035,
                                  0.0008256733417510986,
                                  0.0005404055118560791,
                                  0.0003667771816253662,
                                  0.0005891323089599609,
                                  0.0007394552230834961,
                                  0.0010330379009246826,
                                  0.0007327795028686523,
                                  0.0001760423183441162,
                                  0.0001805126667022705,
                                  0.0011722445487976074,
                                  0.00023120641708374023,
                                  0.00046622753143310547,
                                  0.0005017220973968506,
                                  0.00037470459938049316,
                                  0.0007470846176147461,
                                  0.00034102797508239746,
                                  0.0018736720085144043,
                                  0.0007473528385162354,
                                  0.0008576810359954834,
                                  0.0012683570384979248,
                                  0.0005511641502380371,
                                  0.0008003413677215576,
                                  0.0002823770046234131,
                                  0.0006742775440216064,
                                  0.0006029307842254639,
                                  0.00045618414878845215,
                                  0.00017344951629638672,
                                  0.0012264251708984375,
                                  0.001613914966583252,
                                  0.0009235143661499023,
                                  0.00029850006103515625,
                                  0.0003133118152618408,
                                  0.00010135769844055176,
                                  0.0004534423351287842,
                                  0.00031444430351257324,
                                  0.0007798373699188232,
                                  0.00038126111030578613,
                                  0.00026619434356689453,
                                  0.000617682933807373,
                                  0.0006511211395263672,
                                  0.0008475780487060547,
                                  1.519918441772461e-06,
                                  0.0002251267433166504,
                                  0.0002655982971191406,
                                  0.0005814731121063232,
                                  0.001587003469467163,
                                  0.00012886524200439453,
                                  0.000906139612197876,
                                  0.0006060898303985596,
                                  0.0004534125328063965,
                                  0.0005573630332946777,
                                  0.001411139965057373,
                                  0.0007226467132568359,
                                  0.0007477104663848877,
                                  0.0007035136222839355,
                                  0.00022074580192565918,
                                  0.0014317333698272705,
                                  0.0018418431282043457,
                                  0.00010865926742553711,
                                  0.0008140206336975098,
                                  0.0005422532558441162,
                                  0.00045371055603027344,
                                  0.0006635785102844238,
                                  0.0006209909915924072,
                                  0.0005052685737609863,
                                  0.0005816519260406494,
                                  0.9873101711273193,
                                  0.0006915628910064697,
                                  0.0007537007331848145,
                                  0.00029602646827697754,
                                  0.00020524859428405762,
                                  0.0011404454708099365,
                                  0.0007368624210357666,
                                  0.0002035200595855713,
                                  0.00048407912254333496,
                                  0.00041028857231140137,
                                  3.3676624298095703e-06,
                                  0.0004755854606628418,
                                  0.000834733247756958,
                                  0.0003497898578643799,
                                  0.0012320280075073242,
                                  0.0005603432655334473,
                                  0.0003822147846221924,
                                  0.0009741783142089844,
                                  0.0003153085708618164,
                                  0.0008485913276672363,
                                  0.0035923421382904053,
                                  0.00045371055603027344,
                                  0.0012863576412200928,
                                  0.000866323709487915,
                                  7.393956184387207e-05,
                                  0.0012035071849822998,
                                  0.00018787384033203125,
                                  0.00031045079231262207,
                                  0.0004418790340423584,
                                  0.0001100003719329834,
                                  0.0006164610385894775,
                                  2.7120113372802734e-06,
                                  0.0007382631301879883,
                                  0.00021120905876159668,
                                  0.00043717026710510254,
                                  0.0018209218978881836,
                                  0.00035813450813293457,
                                  0.00024771690368652344,
                                  0.0005538463592529297,
                                  0.0003204941749572754,
                                  0.0013484358787536621,
                                  0.0010192394256591797,
                                  0.0020678043365478516,
                                  0.00020268559455871582,
                                  0.00033402442932128906,
                                  0.00022429227828979492,
                                  0.00023245811462402344,
                                  0.00013360381126403809,
                                  0.0005823671817779541,
                                  0.0003317594528198242,
                                  0.0003043711185455322,
                                  0.0013128221035003662,
                                  0.0008148550987243652,
                                  0.0005481243133544922,
                                  0.0001258552074432373,
                                  0.00011596083641052246,
                                  0.0002785325050354004,
                                  0.00110703706741333,
                                  0.0008533000946044922,
                                  0.001249849796295166],
                         'dim': [1001, 1],
                         'dtype': 'Float',
                         'v': 1}}],
  'pipeline_name': 'databricksazuresdkpipeline',
  'shadow_data': {},
  'time': 1675785912268})]

Undeploy Pipeline

When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks. Note that if the deployment variable is unchanged pipeline.deploy() will restart the inference engine in the same configuration as before.

pipeline.undeploy()

2.1.4 - Wallaroo SDK Google Vertex Install Guide

How to install the Wallaroo SDK in Google Vertex

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Installing the Wallaroo SDK into Google Vertex Workbench

Organizations that use Google Vertex for model training and development can deploy models to Wallaroo through the Wallaroo SDK. The following guide is created to assist users with installing the Wallaroo SDK, setting up authentication through Google Cloud Platform (GCP), and making a standard connection to a Wallaroo instance through Google Workbench.

These instructions are based on the on the Wallaroo SSO for Google Cloud Platform and the Connect to Wallaroo guides.

This tutorial provides the following:

  • aloha-cnn-lstm.zip: A pre-trained open source model that uses an Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.
  • Test Data Files:
    • data-1.json: 1 record
    • data-1k.json: 1,000 records
    • data-25k.json: 25,000 records

To use the Wallaroo SDK within Google Workbench, a virtual environment will be used. This will set the necessary libraries and specific Python version required.

Prerequisites

The following is required for this tutorial:

  • A Wallaroo instance version 2023.1 or later.
  • Python 3.8.6 or later installed locally
  • Conda: Used for managing python virtual environments.

General Steps

For our example, we will perform the following:

  • Wallaroo SDK Install
    • Set up a Python virtual environment through conda with the libraries that enable the virtual environment for use in a Jupyter Hub environment.
    • Install the Wallaroo SDK.
  • Wallaroo SDK from remote JupyterHub Demonstration (Optional): The following steps are an optional exercise to demonstrate using the Wallaroo SDK from a remote connection. The entire tutorial can be found on the Wallaroo Tutorials repository.
    • Connect to a remote Wallaroo instance.
    • Create a workspace for our work.
    • Upload the Aloha model.
    • Create a pipeline that can ingest our submitted data, submit it to the model, and export the results
    • Run a sample inference through our pipeline by loading a file
    • Retrieve the external deployment URL. This sample Wallaroo instance has been configured to create external inference URLs for pipelines. For more information, see the External Inference URL Guide.
    • Run a sample inference through our pipeline’s external URL and store the results in a file. This assumes that the External Inference URLs have been enabled for the target Wallaroo instance.
    • Undeploy the pipeline and return resources back to the Wallaroo instance’s Kubernetes environment.

Install Wallaroo SDK

Set Up Virtual Python Environment

To set up the virtual environment in Google Workbench for using the Wallaroo SDK with Google Workbench:

  1. Start a separate terminal by selecting File->New->Terminal.

  2. Create the Python virtual environment with conda. Replace wallaroosdk with the name of the virtual environment as required by your organization. Note that Python 3.8.6 and above is specified as a requirement for Python libraries used with the Wallaroo SDK. The following will install the latest version of Python 3.8, which as of this time is 3.8.15.

    conda create -n wallaroosdk python=3.8
    
  3. Activate the new environment.

    conda activate wallaroosdk
    
  4. Install the ipykernel library. This allows the JupyterHub notebooks to access the Python virtual environment as a kernel.

    conda install ipykernel
    
  5. Install the new virtual environment as a python kernel.

    ipython kernel install --user --name=wallaroosdk
    
  6. Install the Wallaroo SDK. This process may take several minutes while the other required Python libraries are added to the virtual environment.

    pip install wallaroo==2023.1.0
    

Once the conda virtual environment has been installed, it can either be selected as a new Jupyter Notebook kernel, or the Notebook’s kernel can be set to an existing Jupyter notebook.

To use a new Notebook:

  1. From the main menu, select File->New-Notebook.
  2. From the Kernel selection dropbox, select the new virtual environment - in this case, wallaroosdk.

To update an existing Notebook to use the new virtual environment as a kernel:

  1. From the main menu, select Kernel->Change Kernel.
  2. Select the new kernel.

Sample Wallaroo Connection

With the Wallaroo Python SDK installed, remote commands and inferences can be performed through the following steps.

Open a Connection to Wallaroo

The first step is to connect to Wallaroo through the Wallaroo client.

This is accomplished using the wallaroo.Client(api_endpoint, auth_endpoint, auth_type command) command that connects to the Wallaroo instance services.

The Client method takes the following parameters:

  • api_endpoint (String): The URL to the Wallaroo instance API service.
  • auth_endpoint (String): The URL to the Wallaroo instance Keycloak service.
  • auth_type command (String): The authorization type. In this case, SSO.

The URLs are based on the Wallaroo Prefix and Wallaroo Suffix for the Wallaroo instance. For more information, see the DNS Integration Guide. In the example below, replace “YOUR PREFIX” and “YOUR SUFFIX” with the Wallaroo Prefix and Suffix, respectively.

Once run, the wallaroo.Client command provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Depending on the configuration of the Wallaroo instance, the user will either be presented with a login request to the Wallaroo instance or be authenticated through a broker such as Google, Github, etc. To use the broker, select it from the list under the username/password login forms. For more information on Wallaroo authentication configurations, see the Wallaroo Authentication Configuration Guides.

Wallaroo Login

Once authenticated, the user will verify adding the device the user is establishing the connection from. Once both steps are complete, then the connection is granted.

Device Registration

The connection is stored in the variable wl for use in all other Wallaroo calls.

import wallaroo
from wallaroo.object import EntityNotFoundError

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
# SSO login through keycloak

wallarooPrefix = "YOURPREFIX"
wallarooSuffix = "YOURSUFFIX"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
                    auth_type="sso")

Arrow Support

As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs. This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.

If Arrow support has been enabled, arrowEnabled=True. If disabled or you’re not sure, set it to arrowEnabled=False

The examples below will be shown in an arrow enabled environment.

import os
arrowEnabled=True
os.environ["ARROW_ENABLED"]=f"{arrowEnabled}"

Create the Workspace

We will create a workspace to work in and call it the gcpsdkworkspace, then set it as current workspace environment. We’ll also create our pipeline in advance as gcpsdkpipeline.

  • IMPORTANT NOTE: For this example, the Aloha model is stored in the file alohacnnlstm.zip. When using tensor based models, the zip file must match the name of the tensor directory. For example, if the tensor directory is alohacnnlstm, then the .zip file must be named alohacnnlstm.zip.
workspace_name = 'gcpsdkworkspace'
pipeline_name = 'gcpsdkpipeline'
model_name = 'gcpsdkmodel'
model_file_name = './alohacnnlstm.zip'
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name gcpsdkpipeline
created 2022-12-06 21:35:51.201925+00:00
last_updated 2022-12-06 21:35:51.201925+00:00
deployed (none)
tags
versions 90045b0b-1978-48bb-9f37-05c0c5d8bf22
steps

We can verify the workspace is created the current default workspace with the get_current_workspace() command.

wl.get_current_workspace()
{'name': 'gcpsdkworkspace', 'id': 10, 'archived': False, 'created_by': '0bbf2f62-a4f1-4fe5-aad8-ec1cb7485939', 'created_at': '2022-12-06T21:35:50.34358+00:00', 'models': [], 'pipelines': [{'name': 'gcpsdkpipeline', 'create_time': datetime.datetime(2022, 12, 6, 21, 35, 51, 201925, tzinfo=tzutc()), 'definition': '[]'}]}

Upload the Models

Now we will upload our model. Note that for this example we are applying the model from a .ZIP file. The Aloha model is a protobuf file that has been defined for evaluating web pages, and we will configure it to use data in the tensorflow format.

model = wl.upload_model(model_name, model_file_name).configure("tensorflow")

Deploy a Model

Now that we have a model that we want to use we will create a deployment for it.

We will tell the deployment we are using a tensorflow model and give the deployment name and the configuration we want for the deployment.

To do this, we’ll create our pipeline that can ingest the data, pass the data to our Aloha model, and give us a final output. We’ll call our pipeline externalsdkpipeline, then deploy it so it’s ready to receive data. The deployment process usually takes about 45 seconds.

pipeline.add_model_step(model)
name gcpsdkpipeline
created 2022-12-06 21:35:51.201925+00:00
last_updated 2022-12-06 21:35:51.201925+00:00
deployed (none)
tags
versions 90045b0b-1978-48bb-9f37-05c0c5d8bf22
steps
pipeline.deploy()
name gcpsdkpipeline
created 2022-12-06 21:35:51.201925+00:00
last_updated 2022-12-06 21:35:55.428652+00:00
deployed True
tags
versions 269179a8-79e4-4c58-b9c3-d05436ad7be3, 90045b0b-1978-48bb-9f37-05c0c5d8bf22
steps gcpsdkmodel

We can verify that the pipeline is running and list what models are associated with it.

pipeline.status()
{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.1.174',
   'name': 'engine-7888f44c8b-r2gpr',
   'status': 'Running',
   'reason': None,
   'details': ['containers with unready status: [engine]',
    'containers with unready status: [engine]'],
   'pipeline_statuses': {'pipelines': [{'id': 'gcpsdkpipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'gcpsdkmodel',
      'version': 'c468d323-257b-4717-bbd8-8539a8746496',
      'sha': '7c89707252ce389980d5348c37885d6d72af4c20cd303422e2de7e66dd7ff184',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.1.173',
   'name': 'engine-lb-c6485cfd5-kqsn6',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Interferences

Infer 1 row

Now that the pipeline is deployed and our Aloha model is in place, we’ll perform a smoke test to verify the pipeline is up and running properly. We’ll use the infer_from_file command to load a single encoded URL into the inference engine and print the results back out.

The result should tell us that the tokenized URL is legitimate (0) or fraud (1). This sample data should return close to 0.

# Infer from file
if arrowEnabled is True:
    result = pipeline.infer_from_file('./data/data_1.df.json')
    display(result)
else:
    result = pipeline.infer_from_file("./data/data_1.json")
    display(result[0].data())

Batch Inference

Now that our smoke test is successful, let’s really give it some data. We have two inference files we can use:

  • data-1k.json: Contains 10,000 inferences
  • data-25k.json: Contains 25,000 inferences

We’ll pipe the data-25k.json file through the pipeline deployment URL, and place the results in a file named response.txt. We’ll also display the time this takes. Note that for larger batches of 50,000 inferences or more can be difficult to view in Juypter Hub because of its size.

When retrieving the pipeline inference URL through an external SDK connection, the External Inference URL will be returned. This URL will function provided that the Enable external URL inference endpoints is enabled. For more information, see the Wallaroo Model Endpoints Guide.

external_url = pipeline._deployment._url()
external_url
'https://YOUR PREFIX.api.example.wallaroo.ai/v1/api/pipelines/infer/gcpsdkpipeline-13'

The API connection details can be retrieved through the Wallaroo client mlops() command. This will display the connection URL, bearer token, and other information. The bearer token is available for one hour before it expires.

For this example, the API connection details will be retrieved, then used to submit an inference request through the external inference URL retrieved earlier.

connection =wl.mlops().__dict__
token = connection['token']
token
'eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJyb3dHSmdNdnlCODFyRFBzQURxc3RIM0hIbFdZdmdhMnluUmtGXzllSWhjIn0.eyJleHAiOjE2NzAzNjI2NjMsImlhdCI6MTY3MDM2MjYwMywiYXV0aF90aW1lIjoxNjcwMzYyNTQ1LCJqdGkiOiI5NDk5M2Y2Ni0yMjk2LTRiMTItOTYwMi1iOWEyM2UxY2RhZGIiLCJpc3MiOiJodHRwczovL21hZ2ljYWwtYmVhci0zNzgyLmtleWNsb2FrLndhbGxhcm9vLmNvbW11bml0eS9hdXRoL3JlYWxtcy9tYXN0ZXIiLCJhdWQiOlsibWFzdGVyLXJlYWxtIiwiYWNjb3VudCJdLCJzdWIiOiIwYmJmMmY2Mi1hNGYxLTRmZTUtYWFkOC1lYzFjYjc0ODU5MzkiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJzZGstY2xpZW50Iiwic2Vzc2lvbl9zdGF0ZSI6ImQyYjlkMzFjLWU3ZmMtNDI4OS1hOThjLTI2ZTMwMDBiMzVkMiIsImFjciI6IjEiLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsiY3JlYXRlLXJlYWxtIiwiZGVmYXVsdC1yb2xlcy1tYXN0ZXIiLCJvZmZsaW5lX2FjY2VzcyIsImFkbWluIiwidW1hX2F1dGhvcml6YXRpb24iXX0sInJlc291cmNlX2FjY2VzcyI6eyJtYXN0ZXItcmVhbG0iOnsicm9sZXMiOlsidmlldy1yZWFsbSIsInZpZXctaWRlbnRpdHktcHJvdmlkZXJzIiwibWFuYWdlLWlkZW50aXR5LXByb3ZpZGVycyIsImltcGVyc29uYXRpb24iLCJjcmVhdGUtY2xpZW50IiwibWFuYWdlLXVzZXJzIiwicXVlcnktcmVhbG1zIiwidmlldy1hdXRob3JpemF0aW9uIiwicXVlcnktY2xpZW50cyIsInF1ZXJ5LXVzZXJzIiwibWFuYWdlLWV2ZW50cyIsIm1hbmFnZS1yZWFsbSIsInZpZXctZXZlbnRzIiwidmlldy11c2VycyIsInZpZXctY2xpZW50cyIsIm1hbmFnZS1hdXRob3JpemF0aW9uIiwibWFuYWdlLWNsaWVudHMiLCJxdWVyeS1ncm91cHMiXX0sImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoicHJvZmlsZSBlbWFpbCIsInNpZCI6ImQyYjlkMzFjLWU3ZmMtNDI4OS1hOThjLTI2ZTMwMDBiMzVkMiIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJodHRwczovL2hhc3VyYS5pby9qd3QvY2xhaW1zIjp7IngtaGFzdXJhLXVzZXItaWQiOiIwYmJmMmY2Mi1hNGYxLTRmZTUtYWFkOC1lYzFjYjc0ODU5MzkiLCJ4LWhhc3VyYS1kZWZhdWx0LXJvbGUiOiJ1c2VyIiwieC1oYXN1cmEtYWxsb3dlZC1yb2xlcyI6WyJ1c2VyIl0sIngtaGFzdXJhLXVzZXItZ3JvdXBzIjoie30ifSwicHJlZmVycmVkX3VzZXJuYW1lIjoiam9obi5oYW5zYXJpY2tAd2FsbGFyb28uYWkiLCJlbWFpbCI6ImpvaG4uaGFuc2FyaWNrQHdhbGxhcm9vLmFpIn0.Gnig3PdpMFGSrQ2J4Tj3Nqbk2UOfBCH4MEw2i6p5pLkQ51F8FM7Dq-VOGoNYAXZn2OXw_bKh0Ae60IqglB0PSFTlksVzb1uSGKOPgcZNkI0fTMK99YW71UctMDk9MYrN09bT2GhGQ7FV-tJNqemYSXB3eMIaTkah6AMUfJIYYvf6J2OqXyNJqc6Hwf0-44FGso_N0WXF6GM-ww72ampVjc10Mad30kYzQX508U9RuZXd3uvOrRQHreOcPPmjso1yDbUx8gqLeov_uq3dg5hUY55v2oVBdtXT60-ZBIQP8uETNetv6529Nm52uwKNT7DdjXk85kbJBK8oV6etyfKRDw'
if arrowEnabled is True:
    dataFile="./data/data_25k.df.json"
    contentType="application/json; format=pandas-records"
else:
    dataFile="./data/data_25k.json"
    contentType="application/json"
!curl -X POST {external_url} -H "Content-Type:{contentType}" -H "Authorization: Bearer {token}" --data @{dataFile} > curl_response.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.0M  100 10.1M  100 2886k  2322k   642k  0:00:04  0:00:04 --:--:-- 2965k

Undeploy Pipeline

When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks. Note that if the deployment variable is unchanged pipeline.deploy() will restart the inference engine in the same configuration as before.

pipeline.undeploy()

2.1.5 - Wallaroo SDK Standard Install Guide

How to install the Wallaroo SDK in typical environment

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Installing the Wallaroo SDK

Organizations that develop machine learning models can deploy models to Wallaroo from their local systems to a Wallaroo instance through the Wallaroo SDK. The following guide is created to assist users with installing the Wallaroo SDK and making a standard connection to a Wallaroo instance.

These instructions are based on the on the Connect to Wallaroo guides.

This tutorial provides the following:

  • aloha-cnn-lstm.zip: A pre-trained open source model that uses an Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.
  • Test Data Files:
    • data-1.json: 1 record
    • data-1k.json: 1,000 records
    • data-25k.json: 25,000 records

For this example, a virtual python environment will be used. This will set the necessary libraries and specific Python version required.

Prerequisites

The following is required for this tutorial:

  • A Wallaroo instance version 2023.1 or later.
  • Python 3.8.6 or later installed locally.
  • Conda: Used for managing python virtual environments.

General Steps

For our example, we will perform the following:

  • Wallaroo SDK Install
    • Set up a Python virtual environment through conda with the libraries that enable the virtual environment for use in a Jupyter Hub environment.
    • Install the Wallaroo SDK.
  • Wallaroo SDK from remote JupyterHub Demonstration (Optional): The following steps are an optional exercise to demonstrate using the Wallaroo SDK from a remote connection. The entire tutorial can be found on the Wallaroo Tutorials repository.
    • Connect to a remote Wallaroo instance.
    • Create a workspace for our work.
    • Upload the Aloha model.
    • Create a pipeline that can ingest our submitted data, submit it to the model, and export the results
    • Run a sample inference through our pipeline by loading a file
    • Retrieve the external deployment URL. This sample Wallaroo instance has been configured to create external inference URLs for pipelines. For more information, see the External Inference URL Guide.
    • Run a sample inference through our pipeline’s external URL and store the results in a file. This assumes that the External Inference URLs have been enabled for the target Wallaroo instance.
    • Undeploy the pipeline and return resources back to the Wallaroo instance’s Kubernetes environment.

Install Wallaroo SDK

Set Up Virtual Python Environment

To set up the Python virtual environment for use of the Wallaroo SDK:

  1. From a terminal shell, create the Python virtual environment with conda. Replace wallaroosdk with the name of the virtual environment as required by your organization. Note that Python 3.8.6 and above is specified as a requirement for Python libraries used with the Wallaroo SDK. The following will install the latest version of Python 3.8.

    conda create -n wallaroosdk python=3.8
    
  2. Activate the new environment.

    conda activate wallaroosdk
    
  3. (Optional) For organizations who want to use the Wallaroo SDk from within Jupyter and similar environments:

    1. Install the ipykernel library. This allows the JupyterHub notebooks to access the Python virtual environment as a kernel, and it required for the second part of this tutorial.

      conda install ipykernel
      
    2. Install the new virtual environment as a python kernel.

      ipython kernel install --user --name=wallaroosdk
      
  4. Install the Wallaroo SDK. This process may take several minutes while the other required Python libraries are added to the virtual environment.

    pip install wallaroo==2023.1.0
    

For organizations who will be using the Wallaroo SDK with Jupyter or similar services, the conda virtual environment has been installed, it can either be selected as a new Jupyter Notebook kernel, or the Notebook’s kernel can be set to an existing Jupyter notebook.

To use a new Notebook:

  1. From the main menu, select File->New-Notebook.
  2. From the Kernel selection dropbox, select the new virtual environment - in this case, wallaroosdk.

To update an existing Notebook to use the new virtual environment as a kernel:

  1. From the main menu, select Kernel->Change Kernel.
  2. Select the new kernel.

Sample Wallaroo Connection

With the Wallaroo Python SDK installed, remote commands and inferences can be performed through the following steps.

Open a Connection to Wallaroo

The first step is to connect to Wallaroo through the Wallaroo client.

This is accomplished using the wallaroo.Client(api_endpoint, auth_endpoint, auth_type command) command that connects to the Wallaroo instance services.

The Client method takes the following parameters:

  • api_endpoint (String): The URL to the Wallaroo instance API service.
  • auth_endpoint (String): The URL to the Wallaroo instance Keycloak service.
  • auth_type command (String): The authorization type. In this case, SSO.

The URLs are based on the Wallaroo Prefix and Wallaroo Suffix for the Wallaroo instance. For more information, see the DNS Integration Guide. In the example below, replace “YOUR PREFIX” and “YOUR SUFFIX” with the Wallaroo Prefix and Suffix, respectively.

Once run, the wallaroo.Client command provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Depending on the configuration of the Wallaroo instance, the user will either be presented with a login request to the Wallaroo instance or be authenticated through a broker such as Google, Github, etc. To use the broker, select it from the list under the username/password login forms. For more information on Wallaroo authentication configurations, see the Wallaroo Authentication Configuration Guides.

Wallaroo Login

Once authenticated, the user will verify adding the device the user is establishing the connection from. Once both steps are complete, then the connection is granted.

Device Registration

The connection is stored in the variable wl for use in all other Wallaroo calls.

import wallaroo
from wallaroo.object import EntityNotFoundError

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
wallaroo.__version__
'2023.1.0rc1'
# Login through Wallaroo JupyterHub Service

wl = wallaroo.Client()

# SSO login through keycloak

# wallarooPrefix = "YOUR PREFIX"
# wallarooSuffix = "YOUR SUFFIX"

# wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
#                 auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
#                 auth_type="sso")

Arrow Support

As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs. This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.

If Arrow support has been enabled, arrowEnabled=True. If disabled or you’re not sure, set it to arrowEnabled=False

The examples below will be shown in an arrow enabled environment.

import os
# Only set the below to make the OS environment ARROW_ENABLED to TRUE.  Otherwise, leave as is.
# os.environ["ARROW_ENABLED"]="True"

if "ARROW_ENABLED" not in os.environ or os.environ["ARROW_ENABLED"].casefold() == "False".casefold():
    arrowEnabled = False
else:
    arrowEnabled = True
print(arrowEnabled)
True

Wallaroo Remote SDK Examples

The following examples can be used by an organization to test using the Wallaroo SDK from a remote location from their Wallaroo instance. These examples show how to create workspaces, deploy pipelines, and perform inferences through the SDK and API.

Create the Workspace

We will create a workspace to work in and call it the sdkworkspace, then set it as current workspace environment. We’ll also create our pipeline in advance as sdkpipeline.

  • IMPORTANT NOTE: For this example, the Aloha model is stored in the file alohacnnlstm.zip. When using tensor based models, the zip file must match the name of the tensor directory. For example, if the tensor directory is alohacnnlstm, then the .zip file must be named alohacnnlstm.zip.
workspace_name = 'sdkquickworkspace'
pipeline_name = 'sdkquickpipeline'
model_name = 'sdkquickmodel'
model_file_name = './alohacnnlstm.zip'
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name sdkquickpipeline
created 2023-02-22 21:25:42.312061+00:00
last_updated 2023-02-22 21:27:26.092464+00:00
deployed False
tags
versions 6cd41955-6456-4401-84c6-72eb0b6b550b, f00085b6-a7d6-4297-92b7-9f52139e18ab, 109e38ce-908e-4821-ae14-a210e81f4def
steps sdkquickmodel

We can verify the workspace is created the current default workspace with the get_current_workspace() command.

wl.get_current_workspace()
{'name': 'sdkquickworkspace', 'id': 66, 'archived': False, 'created_by': '138bd7e6-4dc8-4dc1-a760-c9e721ef3c37', 'created_at': '2023-02-22T21:25:41.546584+00:00', 'models': [{'name': 'sdkquickmodel', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 2, 22, 21, 25, 44, 989355, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 2, 22, 21, 25, 44, 989355, tzinfo=tzutc())}], 'pipelines': [{'name': 'sdkquickpipeline', 'create_time': datetime.datetime(2023, 2, 22, 21, 25, 42, 312061, tzinfo=tzutc()), 'definition': '[]'}]}

Upload the Models

Now we will upload our model. Note that for this example we are applying the model from a .ZIP file. The Aloha model is a protobuf file that has been defined for evaluating web pages, and we will configure it to use data in the tensorflow format.

model = wl.upload_model(model_name, model_file_name).configure("tensorflow")

Deploy a Model

Now that we have a model that we want to use we will create a deployment for it.

We will tell the deployment we are using a tensorflow model and give the deployment name and the configuration we want for the deployment.

To do this, we’ll create our pipeline that can ingest the data, pass the data to our Aloha model, and give us a final output. We’ll call our pipeline externalsdkpipeline, then deploy it so it’s ready to receive data. The deployment process usually takes about 45 seconds.

pipeline.add_model_step(model)
name sdkquickpipeline
created 2023-02-22 21:25:42.312061+00:00
last_updated 2023-02-22 21:27:26.092464+00:00
deployed False
tags
versions 6cd41955-6456-4401-84c6-72eb0b6b550b, f00085b6-a7d6-4297-92b7-9f52139e18ab, 109e38ce-908e-4821-ae14-a210e81f4def
steps sdkquickmodel
pipeline
name sdkquickpipeline
created 2023-02-22 21:25:42.312061+00:00
last_updated 2023-02-22 21:27:26.092464+00:00
deployed False
tags
versions 6cd41955-6456-4401-84c6-72eb0b6b550b, f00085b6-a7d6-4297-92b7-9f52139e18ab, 109e38ce-908e-4821-ae14-a210e81f4def
steps sdkquickmodel
pipeline.deploy()
name sdkquickpipeline
created 2023-02-22 21:25:42.312061+00:00
last_updated 2023-02-22 21:47:51.453099+00:00
deployed True
tags
versions 19c972e4-41fa-42ea-be84-604e54b4b5ba, 37acbb17-8a8c-4169-8286-662e9cba3245, 6cd41955-6456-4401-84c6-72eb0b6b550b, f00085b6-a7d6-4297-92b7-9f52139e18ab, 109e38ce-908e-4821-ae14-a210e81f4def
steps sdkquickmodel

We can verify that the pipeline is running and list what models are associated with it.

pipeline.status()
{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.48.0.209',
   'name': 'engine-69478bcd58-kjwxn',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'sdkquickpipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'sdkquickmodel',
      'version': 'e476e115-ade4-40bf-acbb-4074270e01c6',
      'sha': 'd71d9ffc61aaac58c2b1ed70a2db13d1416fb9d3f5b891e5e4e2e97180fe22f8',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.48.0.208',
   'name': 'engine-lb-74b4969486-t9dmq',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Interferences

Infer 1 row

Now that the pipeline is deployed and our Aloha model is in place, we’ll perform a smoke test to verify the pipeline is up and running properly. We’ll use the infer_from_file command to load a single encoded URL into the inference engine and print the results back out.

The result should tell us that the tokenized URL is legitimate (0) or fraud (1). This sample data should return close to 0.

## Demonstrate via straight infer
import json
if arrowEnabled is True:
    data = pd.read_json("./data/data_1.df.json")
    result = pipeline.infer(data)
else:
    data = json.load(open("./data/data_1.json"))
    result = pipeline.infer(data)
display(result)
time in.text_input out.main check_failures
0 2023-03-03 22:42:20.661 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28, 16, 32, 23, 29, 32, 30, 19, 26, 17] [0.997564] 0
# Infer from file
if arrowEnabled is True:
    result = pipeline.infer_from_file('./data/data_1.df.json')
    display(result)
else:
    result = pipeline.infer_from_file("./data/data_1.json")
    display(result[0].data())
time in.text_input out.main check_failures
0 2023-03-03 22:42:20.661 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28, 16, 32, 23, 29, 32, 30, 19, 26, 17] [0.997564] 0

Batch Inference

Now that our smoke test is successful, let’s really give it some data. We have two inference files we can use:

  • data-1k.json: Contains 10,000 inferences
  • data-25k.json: Contains 25,000 inferences

We’ll pipe the data-25k.json file through the pipeline deployment URL, and place the results in a file named response.txt. We’ll also display the time this takes. Note that for larger batches of 50,000 inferences or more can be difficult to view in Juypter Hub because of its size.

When retrieving the pipeline inference URL through an external SDK connection, the External Inference URL will be returned. This URL will function provided that the Enable external URL inference endpoints is enabled. For more information, see the Wallaroo Model Endpoints Guide.

external_url = pipeline._deployment._url()
external_url
'https://sparkly-apple-3026.api.wallaroo.community/v1/api/pipelines/infer/sdkquickpipeline-47'

The API connection details can be retrieved through the Wallaroo client mlops() command. This will display the connection URL, bearer token, and other information. The bearer token is available for one hour before it expires.

For this example, the API connection details will be retrieved, then used to submit an inference request through the external inference URL retrieved earlier.

connection =wl.mlops().__dict__
token = connection['token']
token
'eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJjTGZaYmhVQWl0a210Z0VLV0l1NnczTWlXYmUzWjc3cHdqVjJ2QWM2WUdZIn0.eyJleHAiOjE2NzcxMDI1NDQsImlhdCI6MTY3NzEwMjQ4NCwiYXV0aF90aW1lIjoxNjc3MTAxMTM3LCJqdGkiOiI4ZWRmOTE3MS1kMzg0LTQ5MGItYjUyZi01MmI2NWViMjNjZTUiLCJpc3MiOiJodHRwczovL3NwYXJrbHktYXBwbGUtMzAyNi5rZXljbG9hay53YWxsYXJvby5jb21tdW5pdHkvYXV0aC9yZWFsbXMvbWFzdGVyIiwiYXVkIjpbIm1hc3Rlci1yZWFsbSIsImFjY291bnQiXSwic3ViIjoiMTM4YmQ3ZTYtNGRjOC00ZGMxLWE3NjAtYzllNzIxZWYzYzM3IiwidHlwIjoiQmVhcmVyIiwiYXpwIjoic2RrLWNsaWVudCIsInNlc3Npb25fc3RhdGUiOiI1MGY3NWI3MC1kOWFmLTRmYTYtODVkZC1lMzU0MmVmYjE2N2IiLCJhY3IiOiIwIiwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbImNyZWF0ZS1yZWFsbSIsImRlZmF1bHQtcm9sZXMtbWFzdGVyIiwib2ZmbGluZV9hY2Nlc3MiLCJhZG1pbiIsInVtYV9hdXRob3JpemF0aW9uIl19LCJyZXNvdXJjZV9hY2Nlc3MiOnsibWFzdGVyLXJlYWxtIjp7InJvbGVzIjpbInZpZXctcmVhbG0iLCJ2aWV3LWlkZW50aXR5LXByb3ZpZGVycyIsIm1hbmFnZS1pZGVudGl0eS1wcm92aWRlcnMiLCJpbXBlcnNvbmF0aW9uIiwiY3JlYXRlLWNsaWVudCIsIm1hbmFnZS11c2VycyIsInF1ZXJ5LXJlYWxtcyIsInZpZXctYXV0aG9yaXphdGlvbiIsInF1ZXJ5LWNsaWVudHMiLCJxdWVyeS11c2VycyIsIm1hbmFnZS1ldmVudHMiLCJtYW5hZ2UtcmVhbG0iLCJ2aWV3LWV2ZW50cyIsInZpZXctdXNlcnMiLCJ2aWV3LWNsaWVudHMiLCJtYW5hZ2UtYXV0aG9yaXphdGlvbiIsIm1hbmFnZS1jbGllbnRzIiwicXVlcnktZ3JvdXBzIl19LCJhY2NvdW50Ijp7InJvbGVzIjpbIm1hbmFnZS1hY2NvdW50IiwibWFuYWdlLWFjY291bnQtbGlua3MiLCJ2aWV3LXByb2ZpbGUiXX19LCJzY29wZSI6ImVtYWlsIHByb2ZpbGUiLCJzaWQiOiI1MGY3NWI3MC1kOWFmLTRmYTYtODVkZC1lMzU0MmVmYjE2N2IiLCJlbWFpbF92ZXJpZmllZCI6dHJ1ZSwiaHR0cHM6Ly9oYXN1cmEuaW8vand0L2NsYWltcyI6eyJ4LWhhc3VyYS11c2VyLWlkIjoiMTM4YmQ3ZTYtNGRjOC00ZGMxLWE3NjAtYzllNzIxZWYzYzM3IiwieC1oYXN1cmEtZGVmYXVsdC1yb2xlIjoidXNlciIsIngtaGFzdXJhLWFsbG93ZWQtcm9sZXMiOlsidXNlciJdLCJ4LWhhc3VyYS11c2VyLWdyb3VwcyI6Int9In0sInByZWZlcnJlZF91c2VybmFtZSI6ImpvaG4uaGFuc2FyaWNrQHdhbGxhcm9vLmFpIiwiZW1haWwiOiJqb2huLmhhbnNhcmlja0B3YWxsYXJvby5haSJ9.ErXJsmadl3w3YleT29dZoo6TNoC8QOxMHVWHTyV9FVsUhIegpQlwTmVjHITtPu5aoX9EXW-lFOtz3-gduozU31sxwtPQG4DbWy62akxA2H11EepXg70AqzMZvFxeSp5blPI6p6miNxPTGsjLds6vlgZwd48IgDOg5RBSgS6uWCvMaj4AsvsUFuSxKuNkf_WDK10J_x8dq3osBGAytCUTwiF0ybPObVDBtg9UySv-PHQyUsJXLZl7DLsBOJ9fPy-Te4AHsKBjOg2UYL6KGzduQ_BIwDMb8obG2ILc2s2_rjnfZ_4sImp3A7DppohwjRNl18-caJhOSnVrYU0en_LDHw'
if arrowEnabled is True:
    dataFile="./data/data_25k.df.json"
    contentType="application/json; format=pandas-records"
else:
    dataFile="./data/data_25k.json"
    contentType="application/json"
!curl -X POST {external_url} -H "Content-Type:{contentType}" -H "Authorization: Bearer {token}" --data @{dataFile} > curl_response.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 34.3M  100 16.3M  100 18.0M  1115k  1229k  0:00:15  0:00:15 --:--:-- 4153k

Undeploy Pipeline

When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks. Note that if the deployment variable is unchanged pipeline.deploy() will restart the inference engine in the same configuration as before.

pipeline.undeploy()
name sdkquickpipeline
created 2023-02-22 21:25:42.312061+00:00
last_updated 2023-02-22 21:46:29.910170+00:00
deployed False
tags
versions 37acbb17-8a8c-4169-8286-662e9cba3245, 6cd41955-6456-4401-84c6-72eb0b6b550b, f00085b6-a7d6-4297-92b7-9f52139e18ab, 109e38ce-908e-4821-ae14-a210e81f4def
steps sdkquickmodel

2.2 - Wallaroo SDK Essentials Guide

Reference Guide for the most essential Wallaroo SDK Commands

The following commands are the most essential when working with Wallaroo.

Supported Model Versions and Libraries

The following ML Model versions and Python libraries are supported by Wallaroo. When using the Wallaroo autoconversion library or working with a local version of the Wallaroo SDK, use the following versions for maximum compatibility.

Library Supported Version
Python 3.8.6 and above
onnx 1.12.0
tensorflow 2.9.1
keras 2.9.0
pytorch Latest stable version. When converting from PyTorch to onnx, verify that the onnx version matches the version above.
sk-learn aka scikit-learn 1.1.2
statsmodels 0.13.2
XGBoost 1.6.2
MLFlow 1.30.0

Supported Data Types

The following data types are supported for transporting data to and from Wallaroo in the following run times:

  • ONNX
  • TensorFlow
  • MLFlow

Float Types

Runtime BFloat16* Float16 Float32 Float64
ONNX X X
TensorFlow X X X
MLFlow X X X
  • * (Brain Float 16, represented internally as a f32)

Int Types

Runtime Int8 Int16 Int32 Int64
ONNX X X X X
TensorFlow X X X X
MLFlow X X X X

Uint Types

Runtime Uint8 Uint16 Uint32 Uint64
ONNX X X X X
TensorFlow X X X X
MLFlow X X X X

Other Types

Runtime Boolean Utf8 (String) Complex 64 Complex 128 FixedSizeList*
ONNX X
Tensor X X X
MLFlow X X X
  • * Fixed sized lists of any of the previously supported data types.

2.2.1 - Wallaroo SDK Essentials Guide: Client Connection

How to connect to a Wallaroo instance through the Wallaroo SDK

Connect to Wallaroo

Users connect to a Wallaroo instance with the Wallaroo Client class. This connection can be made from within the Wallaroo instance, or external from the Wallaroo instance via the Wallaroo SDK.

Once run, the wallaroo.Client command provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Depending on the configuration of the Wallaroo instance, the user will either be presented with a login request to the Wallaroo instance or be authenticated through a broker such as Google, Github, etc. To use the broker, select it from the list under the username/password login forms. For more information on Wallaroo authentication configurations, see the Wallaroo Authentication Configuration Guides.

Wallaroo Login

Once authenticated, the user will verify adding the device the user is establishing the connection from. Once both steps are complete, then the connection is granted.

Device Registration

Connect to Within the Wallaroo Instance

Users who connect to their Wallaroo instance will be authenticated with the Wallaroo Client() method.

The first step in using Wallaroo is creating a connection. To connect to your Wallaroo environment:

  1. Import the wallaroo library:

    import wallaroo
    
  2. Open a connection to the Wallaroo environment with the wallaroo.Client() command and save it to a variable.

    In this example, the Wallaroo connection is saved to the variable wl.

    wl = wallaroo.Client()
    
  3. A verification URL will be displayed. Enter it into your browser and grant access to the SDK client.

    Wallaroo Confirm Connection
  4. Once this is complete, you will be able to continue with your Wallaroo commands.

    Wallaroo Connection Example

Connect from Outside the Wallaroo Instance

Users who have installed the Wallaroo SDK from an external location, such as their own JupyterHub service, Google Workbench, or other services can connect via Single-Sign On (SSO). This is accomplished using the wallaroo.Client(api_endpoint, auth_endpoint, auth_type command) command that connects to the Wallaroo instance services. For more information on the DNS names of Wallaroo services, see the DNS Integration Guide.

Before performing this step, verify that that SSO is enabled for the specific service. For more information, see the Wallaroo Authentication Configuration Guide.

The Client method takes the following parameters:

  • api_endpoint (String): The URL to the Wallaroo instance API service.
  • auth_endpoint (String): The URL to the Wallaroo instance Keycloak service.
  • auth_type command (String): The authorization type. In this case, SSO.

Once run, the wallaroo.Client command provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. This connection is stored into a variable that can be referenced later.

In this example, a connection will be made to the Wallaroo instance shadowy-unicorn-5555.wallaroo.ai through SSO authentication.

import wallaroo
from wallaroo.object import EntityNotFoundError

# SSO login through keycloak

wl = wallaroo.Client(api_endpoint="https://shadowy-unicorn-5555.api.wallaroo.ai", 
                    auth_endpoint="https://shadowy-unicorn-5555.keycloak.wallaroo.ai", 
                    auth_type="sso")
Please log into the following URL in a web browser:

    https://shadowy-unicorn-5555.keycloak.wallaroo.example.com/auth/realms/master/device?user_code=LGZP-FIQX

Login successful!

2.2.2 - Wallaroo SDK Essentials Guide: Workspace Management

How to create and use Wallaroo Workspaces through the Wallaroo SDK

Workspace Management

Workspaces are used to segment groups of models into separate environments. This allows different users to either manage or have access to each workspace, controlling the models and pipelines assigned to the workspace.

Create a Workspace

Workspaces can be created either through the Wallaroo Dashboard or through the Wallaroo SDK.

  • IMPORTANT NOTICE

    Workspace names are not forced to be unique. You can have 50 workspaces all named my-amazing-workspace, which can cause confusion in determining which workspace to use.

    It is recommended that organizations agree on a naming convention and select the workspace to use rather than creating a new one each time.

To create a workspace, use the create_workspace("{WORKSPACE NAME}") command through an established Wallaroo connection and store the workspace settings into a new variable. Once the new workspace is created, the user who created the workspace is assigned as its owner. The following template is an example:

{New Workspace Variable} = {Wallaroo Connection}.create_workspace("{New Workspace Name}")

For example, if the connection is stored in the variable wl and the new workspace will be named imdb, then the command to store it in the new_workspace variable would be:

new_workspace = wl.create_workspace("imdb-workspace")

List Workspaces

The command list_workspaces() displays the workspaces that are part of the current Wallaroo connection. The following details are returned as an array:

Parameter Type Description
Name String The name of the workspace. Note that workspace names are not unique.
Created At DateTime The date and time the workspace was created.
Users Array[Users] A list of all users assigned to this workspace.
Models Integer The number of models uploaded to the workspace.
Pipelines Integer The number of pipelines in the environment.

For example, for the Wallaroo connection wl the following workspaces are returned:

wl.list_workspaces()
Name Created At Users Models Pipelines
aloha-workspace 2022-03-29 20:15:38 ['steve@ex.co'] 1 1
ccfraud-workspace 2022-03-29 20:20:55 ['steve@ex.co'] 1 1
demandcurve-workspace 2022-03-29 20:21:32 ['steve@ex.co'] 3 1
imdb-workspace 2022-03-29 20:23:08 ['steve@ex.co'] 2 1
aloha-workspace 2022-03-29 20:33:54 ['steve@ex.co'] 1 1
imdb-workspace 2022-03-30 17:09:23 ['steve@ex.co'] 2 1
imdb-workspace 2022-03-30 17:43:09 ['steve@ex.co'] 0 0

Get Current Workspace

The command get_current_workspace displays the current workspace used for the Wallaroo connection. The following information is returned by default:

Parameter Type Description
name String The name of the current workspace.
id Integer The ID of the current workspace.
archived Bool Whether the workspace is archived or not.
created_by String The identifier code for the user that created the workspace.
created_at DateTime When the timestamp for when workspace was created.
models Array[Models] The models that are uploaded to this workspace.
pipelines Array[Pipelines] The pipelines created for the workspace.

For example, the following will display the current workspace for the wl connection that contains a single pipeline and multiple models:

wl.get_current_workspace()
{'name': 'imdb-workspace', 'id': 6, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-30T17: 09: 23.960406+00: 00', 'models': [
        {'name': 'embedder-o', 'version': '6dbe5524-7bc3-4ff3-8ca8-d454b2cbd0e4', 'file_name': 'embedder.onnx', 'last_update_time': datetime.datetime(2022,
            3,
            30,
            17,
            34,
            18,
            321105, tzinfo=tzutc())
        },
        {'name': 'smodel-o', 'version': '6eb7f824-3d77-417f-9169-6a301d20d842', 'file_name': 'sentiment_model.onnx', 'last_update_time': datetime.datetime(2022,
            3,
            30,
            17,
            34,
            18,
            783485, tzinfo=tzutc())
        }
    ], 'pipelines': [
        {'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022,
            3,
            30,
            17,
            34,
            19,
            318819, tzinfo=tzutc()), 'definition': '[]'
        }
    ]
}

Set the Current Workspace

The current workspace can be set through set_current_workspace for the Wallaroo connection through the following call, and returns the workspace details as a JSON object:

{Wallaroo Connection}.set_current_workspace({Workspace Object})

Set Current Workspace from a New Workspace

The following example creates the workspace imdb-workspace through the Wallaroo connection stored in the variable wl, then sets it as the current workspace:

new_workspace = wl.create_workspace("imdb-workspace")
wl.set_current_workspace(new_workspace)
{'name': 'imdb-workspace', 'id': 7, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-30T17:43:09.405038+00:00', 'models': [], 'pipelines': []}

Set the Current Workspace an Existing Workspace

To set the current workspace from an established workspace, the easiest method is to use list_workspaces() then set the current workspace as the array value displayed. For example, from the following list_workspaces() command the 3rd workspace element demandcurve-workspace can be assigned as the current workspace:

wl.list_workspaces()

Name Created At Users Models Pipelines
aloha-workspace 2022-03-29 20:15:38 ['steve@ex.co'] 1 1
ccfraud-workspace 2022-03-29 20:20:55 ['steve@ex.co'] 1 1
demandcurve-workspace 2022-03-29 20:21:32 ['steve@ex.co'] 3 1
imdb-workspace 2022-03-29 20:23:08 ['steve@ex.co'] 2 1
aloha-workspace 2022-03-29 20:33:54 ['steve@ex.co'] 1 1
imdb-workspace 2022-03-30 17:09:23 ['steve@ex.co'] 2 1
imdb-workspace 2022-03-30 17:43:09 ['steve@ex.co'] 0 0

wl.set_current_workspace(wl.list_workspaces()[2])

{'name': 'demandcurve-workspace', 'id': 3, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-29T20:21:32.732178+00:00', 'models': [{'name': 'demandcurve', 'version': '4f5193fc-9c18-4851-8489-42e61d095588', 'file_name': 'demand_curve_v1.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 21, 32, 822812, tzinfo=tzutc())}, {'name': 'preprocess', 'version': '159b9e99-edb6-4c5e-8336-63bc6000623e', 'file_name': 'preprocess.py', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 21, 32, 984117, tzinfo=tzutc())}, {'name': 'postprocess', 'version': '77ee154c-d64c-49dd-985a-96f4c2931b6e', 'file_name': 'postprocess.py', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 21, 33, 119037, tzinfo=tzutc())}], 'pipelines': [{'name': 'demand-curve-pipeline', 'create_time': datetime.datetime(2022, 3, 29, 20, 21, 33, 264321, tzinfo=tzutc()), 'definition': '[]'}]}

Add a User to a Workspace

Users are added to the workspace via their email address through the wallaroo.workspace.Workspace.add_user({email address}) command. The email address must be assigned to a current user in the Wallaroo platform before they can be assigned to the workspace.

For example, the following workspace imdb-workspace has the user steve@ex.co. We will add the user john@ex.co to this workspace:

wl.list_workspaces()

Name Created At Users Models Pipelines
aloha-workspace 2022-03-29 20:15:38 ['steve@ex.co'] 1 1
ccfraud-workspace 2022-03-29 20:20:55 ['steve@ex.co'] 1 1
demandcurve-workspace 2022-03-29 20:21:32 ['steve@ex.co'] 3 1
imdb-workspace 2022-03-29 20:23:08 ['steve@ex.co'] 2 1
aloha-workspace 2022-03-29 20:33:54 ['steve@ex.co'] 1 1
imdb-workspace 2022-03-30 17:09:23 ['steve@ex.co'] 2 1
imdb-workspace 2022-03-30 17:43:09 ['steve@ex.co'] 0 0

current_workspace = wl.list_workspaces()[3]

current_workspace.add_user("john@ex.co")

{'name': 'imdb-workspace', 'id': 4, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-29T20:23:08.742676+00:00', 'models': [{'name': 'embedder-o', 'version': '23a33c3d-68e6-4bdb-a8bc-32ea846908ee', 'file_name': 'embedder.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 23, 8, 833716, tzinfo=tzutc())}, {'name': 'smodel-o', 'version': '2c298aa9-be9d-482d-8188-e3564bdbab43', 'file_name': 'sentiment_model.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 23, 9, 49881, tzinfo=tzutc())}], 'pipelines': [{'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 29, 20, 23, 28, 518946, tzinfo=tzutc()), 'definition': '[]'}]}

wl.list_workspaces()

Name Created At Users Models Pipelines
aloha-workspace 2022-03-29 20:15:38 ['steve@ex.co'] 1 1
ccfraud-workspace 2022-03-29 20:20:55 ['steve@ex.co'] 1 1
demandcurve-workspace 2022-03-29 20:21:32 ['steve@ex.co'] 3 1
imdb-workspace 2022-03-29 20:23:08 ['steve@ex.co', 'john@ex.co'] 2 1
aloha-workspace 2022-03-29 20:33:54 ['steve@ex.co'] 1 1
imdb-workspace 2022-03-30 17:09:23 ['steve@ex.co'] 2 1
imdb-workspace 2022-03-30 17:43:09 ['steve@ex.co'] 0 0

Remove a User to a Workspace

Removing a user from a workspace is performed through the wallaroo.workspace.Workspace.remove_user({email address}) command, where the {email address} matches a user in the workspace.

In the following example, the user john@ex.co is removed from the workspace imdb-workspace.

wl.list_workspaces()

Name Created At Users Models Pipelines
aloha-workspace 2022-03-29 20:15:38 ['steve@ex.co'] 1 1
ccfraud-workspace 2022-03-29 20:20:55 ['steve@ex.co'] 1 1
demandcurve-workspace 2022-03-29 20:21:32 ['steve@ex.co'] 3 1
imdb-workspace 2022-03-29 20:23:08 ['steve@ex.co', 'john@ex.co'] 2 1
aloha-workspace 2022-03-29 20:33:54 ['steve@ex.co'] 1 1
imdb-workspace 2022-03-30 17:09:23 ['steve@ex.co'] 2 1
imdb-workspace 2022-03-30 17:43:09 ['steve@ex.co'] 0 0

current_workspace = wl.list_workspaces()[3]

current_workspace.remove_user("john@ex.co")

wl.list_workspaces()

Name Created At Users Models Pipelines
aloha-workspace 2022-03-29 20:15:38 ['steve@ex.co'] 1 1
ccfraud-workspace 2022-03-29 20:20:55 ['steve@ex.co'] 1 1
demandcurve-workspace 2022-03-29 20:21:32 ['steve@ex.co'] 3 1
imdb-workspace 2022-03-29 20:23:08 ['steve@ex.co'] 2 1
aloha-workspace 2022-03-29 20:33:54 ['steve@ex.co'] 1 1
imdb-workspace 2022-03-30 17:09:23 ['steve@ex.co'] 2 1
imdb-workspace 2022-03-30 17:43:09 ['steve@ex.co'] 0 0

Add a Workspace Owner

To update the owner of workspace, or promote an existing user of a workspace to the owner of workspace, use the wallaroo.workspace.Workspace.add_owner({email address}) command. The email address must be assigned to a current user in the Wallaroo platform before they can be assigned as the owner to the workspace.

The following example shows assigning the user john@ex.co as an owner to the workspace imdb-workspace:

wl.list_workspaces()

Name Created At Users Models Pipelines
aloha-workspace 2022-03-29 20:15:38 ['steve@ex.co'] 1 1
ccfraud-workspace 2022-03-29 20:20:55 ['steve@ex.co'] 1 1
demandcurve-workspace 2022-03-29 20:21:32 ['steve@ex.co'] 3 1
imdb-workspace 2022-03-29 20:23:08 ['steve@ex.co'] 2 1
aloha-workspace 2022-03-29 20:33:54 ['steve@ex.co'] 1 1
imdb-workspace 2022-03-30 17:09:23 ['steve@ex.co'] 2 1
imdb-workspace 2022-03-30 17:43:09 ['steve@ex.co'] 0 0

current_workspace = wl.list_workspaces()[3]

current_workspace.add_owner("john@ex.co")

{'name': 'imdb-workspace', 'id': 4, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-29T20:23:08.742676+00:00', 'models': [{'name': 'embedder-o', 'version': '23a33c3d-68e6-4bdb-a8bc-32ea846908ee', 'file_name': 'embedder.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 23, 8, 833716, tzinfo=tzutc())}, {'name': 'smodel-o', 'version': '2c298aa9-be9d-482d-8188-e3564bdbab43', 'file_name': 'sentiment_model.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 23, 9, 49881, tzinfo=tzutc())}], 'pipelines': [{'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 29, 20, 23, 28, 518946, tzinfo=tzutc()), 'definition': '[]'}]}

wl.list_workspaces()

Name Created At Users Models Pipelines
aloha-workspace 2022-03-29 20:15:38 ['steve@ex.co'] 1 1
ccfraud-workspace 2022-03-29 20:20:55 ['steve@ex.co'] 1 1
demandcurve-workspace 2022-03-29 20:21:32 ['steve@ex.co'] 3 1
imdb-workspace 2022-03-29 20:23:08 ['steve@ex.co', 'john@ex.co'] 2 1
aloha-workspace 2022-03-29 20:33:54 ['steve@ex.co'] 1 1
imdb-workspace 2022-03-30 17:09:23 ['steve@ex.co'] 2 1
imdb-workspace 2022-03-30 17:43:09 ['steve@ex.co'] 0 0

2.2.3 - Wallaroo SDK Essentials Guide: Model Management

How to create and manage Wallaroo Models through the Wallaroo SDK

Supported Model Versions and Libraries

The following ML Model versions and Python libraries are supported by Wallaroo. When using the Wallaroo autoconversion library or working with a local version of the Wallaroo SDK, use the following versions for maximum compatibility.

Library Supported Version
Python 3.8.6 and above
onnx 1.12.0
tensorflow 2.9.1
keras 2.9.0
pytorch Latest stable version. When converting from PyTorch to onnx, verify that the onnx version matches the version above.
sk-learn aka scikit-learn 1.1.2
statsmodels 0.13.2
XGBoost 1.6.2
MLFlow 1.30.0

Supported Data Types

The following data types are supported for transporting data to and from Wallaroo in the following run times:

  • ONNX
  • TensorFlow
  • MLFlow

Float Types

Runtime BFloat16* Float16 Float32 Float64
ONNX X X
TensorFlow X X X
MLFlow X X X
  • * (Brain Float 16, represented internally as a f32)

Int Types

Runtime Int8 Int16 Int32 Int64
ONNX X X X X
TensorFlow X X X X
MLFlow X X X X

Uint Types

Runtime Uint8 Uint16 Uint32 Uint64
ONNX X X X X
TensorFlow X X X X
MLFlow X X X X

Other Types

Runtime Boolean Utf8 (String) Complex 64 Complex 128 FixedSizeList*
ONNX X
Tensor X X X
MLFlow X X X
  • * Fixed sized lists of any of the previously supported data types.

Model Management

Upload Models to a Workspace

Models are uploaded to the current workspace through the Wallaroo Client upload_model("{Model Name}", "{Model Path}).configure(options). In most cases, leaving the options field can be left blank. For more details, see the full SDK guide.

Models can either be uploaded in the Open Neural Network eXchange(ONNX) format, or be auto-converted and uploaded using the Wallaroo convert_model(path, source_type, conversion_arguments) method. For more information, see the tutorial series ONNX Conversion Tutorials.

Wallaroo can directly import Open Neural Network Exchange models into the Wallaroo engine. Other ML Models can be imported with the Auto-Convert Models methods.

The following models are supported natively by Wallaroo:

Wallaroo Version ONNX Version ONNX IR Version ONNX OPset Version ONNX ML Opset Version Tensorflow Version
2022.4 (December 2022) 1.12.1 8 17 3 2.9.1
After April 2022 until release 2022.4 (December 2022) 1.10.* 7 15 2 2.4
Before April 2022 1.6.* 7 13 2 2.4

For the most recent release of Wallaroo September 2022, the following native runtimes are supported:

Using different versions or settings outside of these specifications may result in inference issues and other unexpected behavior.

The following example shows how to upload two models to the imdb-workspace workspace:

wl.get_current_workspace()

{'name': 'imdb-workspace', 'id': 8, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-30T21:13:21.87287+00:00', 'models': [], 'pipelines': []}

embedder = wl.upload_model('embedder-o', './embedder.onnx').configure()
smodel = wl.upload_model('smodel-o', './sentiment_model.onnx').configure()

{'name': 'imdb-workspace', 'id': 9, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-30T21:14:37.733171+00:00', 'models': [{'name': 'embedder-o', 'version': '28ecb706-473e-4f24-9eae-bfa71b897108', 'file_name': 'embedder.onnx', 'last_update_time': datetime.datetime(2022, 3, 30, 21, 14, 37, 815243, tzinfo=tzutc())}, {'name': 'smodel-o', 'version': '5d2782e1-fb88-430f-b6eb-c0a0eb46beb9', 'file_name': 'sentiment_model.onnx', 'last_update_time': datetime.datetime(2022, 3, 30, 21, 14, 38, 77973, tzinfo=tzutc())}], 'pipelines': []}

Auto-Convert Models

Machine Learning (ML) models can be converted and uploaded into Wallaroo workspace using the Wallaroo Client convert_model(path, source_type, conversion_arguments) method. This conversion process transforms the model into an open format that can be run across different framework at compiled C-language speeds.

The three input parameters are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
    • keras: ModelConversionSource.KERAS
  • conversion_arguments: The arguments for the conversion based on the type of model being converted. These are:
    • wallaroo.ModelConversion.ConvertKerasArguments: Used for converting keras type models and takes the following parameters:
      • name: The name of the model being converted.
      • comment: Any comments for the model.
      • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}. See ModelConversionTypes for more details.
      • dimensions: Corresponds to the keras xtrain in the format [{Number of Rows/None}, {Number of Columns 1}, {Number of Columns 2}...]. For a standard 1-dimensional array with 100 columns this would typically be [None, 100].
    • wallaroo.ModelConversion.ConvertSKLearnArguments: Used for sklearn models and takes the following parameters:
      • name: The name of the model being converted.
      • comment: Any comments for the model.
      • number_of_columns: The number of columns the model was trained for.
      • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}. See ModelConversionTypes for more details.
    • wallaroo.ModelConversion.ConvertXGBoostArgs: Used for XGBoost models and takes the following parameters:
      • name: The name of the model being converted.
      • comment: Any comments for the model.
      • number_of_columns: The number of columns the model was trained for.
      • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}. See ModelConversionTypes for more details.

Once uploaded, they will be displayed in the Wallaroo Models Dashboard as {unique-file-id}-converted.onnx:

Converted Model

ModelConversionInputTypes

The following data types are supported with the ModelConversionInputType parameter:

Parameter Data Type
Float16 float16
Float32 float32
Float64 float64
Int16 int16
Int32 int32
Int64 int64
UInt8 uint8
UInt16 uint16
UInt32 uint32
UInt64 uint64
Boolean bool
Double double

sk-learn Example

The following example converts and uploads a Linear Regression sklearn model lm.pickle and stores it in the variable converted_model:

wl = wallaroo.Client()

workspace_name = "testconversion"
_ = wl.set_current_workspace(get_or_create_workspace(workspace_name))

model_conversion_args = ConvertSKLearnArguments(
    name="lm-test",
    comment="test linear regression",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Double
)

model_conversion_type = ModelConversionSource.SKLEARN

# convert the model and store it in the variable `converted_model`:

converted_model = wl.convert_model('lm.pickle', model_conversion_type, model_conversion_args)

keras Example

The following example shows converting a keras model with 100 columns and uploading it to a Wallaroo instance:

model_columns = 100

model_conversion_args = ConvertKerasArguments(
    name=model_name,
    comment="simple keras model",
    input_type=ModelConversionInputType.Float32,
    dimensions=(None, model_columns)
)
model_conversion_type = ModelConversionSource.KERAS

model_wl = wl.convert_model('simple_sentiment_model.zip', model_conversion_type, model_conversion_args)
model_wl
{'name': 'simple-sentiment-model', 'version': 'c76870f8-e16b-4534-bb17-e18a3e3806d5', 'file_name': '14d9ab8d-47f4-4557-82a7-6b26cb67ab05-converted.onnx', 'last_update_time': datetime.datetime(2022, 7, 7, 16, 41, 22, 528430, tzinfo=tzutc())}

2.2.4 - Wallaroo SDK Essentials Guide: User Management

How to create and manage Wallaroo Users through the Wallaroo SDK

Managing Workspace Users

Users are managed via their email address, and can be assigned to a workspace as either the owner or a user.

List All Users

list_users() returns an array of all users registered in the connected Wallaroo platform with the following values:

Parameter Type Description
id String The unique identifier for the user.
email String The unique email identifier for the user.
username String The unique username of the user.

For example, listing all users in the Wallaroo connection returns the following:

wl.list_users()
[User({"id": "7b8b4f7d-de27-420f-9cd0-892546cb0f82", "email": "test@test.com", "username": "admin"),
 User({"id": "45e6b641-fe57-4fb2-83d2-2c2bd201efe8", "email": "steve@ex.co", "username": "steve")]

Get User by Email

The get_user_by_email({email}) command finds the user who’s email address matches the submitted {email} field. If no email address matches then the return will be null.

For example, the user steve with the email address steve@ex.co returns the following:

wl.get_user_by_email("steve@ex.co")

User({"id": "45e6b641-fe57-4fb2-83d2-2c2bd201efe8", "email": "steve@ex.co", "username": "steve")

Activate and Deactivate Users

To remove a user’s access to the Wallaroo instance, use the Wallaroo Client deactivate_user("{User Email Address}) method, replacing the {User Email Address} with the email address of the user to deactivate.

To activate a user, use the Wallaroo Client active_user("{User Email Address}) method, replacing the {User Email Address} with the email address of the user to activate.

This feature impacts Wallaroo Community’s license count. Wallaroo Community only allows a total of 5 users per Wallaroo Community instance. Deactivated users does not count to this total - this allows organizations to add users, then activate/deactivate them as needed to stay under the total number of licensed users count.

Wallaroo Enterprise has no limits on the number of users who can be added or active in a Wallaroo instance.

In this example, the user testuser@wallaroo.ai will be deactivated then reactivated.

wl.list_users()

[User({"id": "0528f34c-2725-489f-b97b-da0cde02cbd9", "email": "testuser@wallaroo.ai", "username": "testuser@wallaroo.ai"),
 User({"id": "3927b9d3-c279-442c-a3ac-78ba1d2b14d8", "email": "john.hummel+signuptest@wallaroo.ai", "username": "john.hummel+signuptest@wallaroo.ai")]

wl.deactivate_user("testuser@wallaroo.ai")

wl.activate_user("testuser@wallaroo.ai")

2.2.5 - Wallaroo SDK Essentials Guide: Pipeline Management

How to create and manage Wallaroo Pipelines through the Wallaroo SDK

Pipeline Management

Pipelines are the method of taking submitting data and processing that data through the models. Each pipeline can have one or more steps that submit the data from the previous step to the next one. Information can be submitted to a pipeline as a file, or through the pipeline’s URL.

A pipeline’s metrics can be viewed through the Wallaroo Dashboard Pipeline Details and Metrics page.

Create a Pipeline

New pipelines are created in the current workspace.

To create a new pipeline, use the Wallaroo Client build_pipeline("{Pipeline Name}") command.

The following example creates a new pipeline imdb-pipeline through a Wallaroo Client connection wl:

imdb_pipeline = wl.build_pipeline("imdb-pipeline")

imdb_pipeline.status()
{'status': 'Pipeline imdb-pipeline is not deployed'}

List All Pipelines

The Wallaroo Client method list_pipelines() lists all pipelines in a Wallaroo Instance.

The following example lists all pipelines in the wl Wallaroo Client connection:

wl.list_pipelines()

[{'name': 'ccfraud-pipeline', 'create_time': datetime.datetime(2022, 4, 12, 17, 55, 41, 944976, tzinfo=tzutc()), 'definition': '[]'}]

Select an Existing Pipeline

Rather than creating a new pipeline each time, an existing pipeline can be selected by using the list_pipelines() command and assigning one of the array members to a variable.

The following example sets the pipeline ccfraud-pipeline to the variable current_pipeline:

wl.list_pipelines()

[{'name': 'ccfraud-pipeline', 'create_time': datetime.datetime(2022, 4, 12, 17, 55, 41, 944976, tzinfo=tzutc()), 'definition': '[]'}]

current_pipeline = wl.list_pipelines()[0]

current_pipeline.status()

{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.244.5.4',
   'name': 'engine-7fcc7df596-hvlxb',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'ccfraud-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'ccfraud-model',
      'version': '4624e8a8-1414-4408-8b40-e03da4b5cb68',
      'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.1.24',
   'name': 'engine-lb-85846c64f8-mtq9p',
   'status': 'Running',
   'reason': None}]}

Add a Step to a Pipeline

Once a pipeline has been created, or during its creation process, a pipeline step can be added. The pipeline step refers to the model that will perform an inference off of the data submitted to it. Each time a step is added, it is added to the pipeline’s models array.

A pipeline step is added through the pipeline add_model_step({Model}) command.

In the following example, two models uploaded to the workspace are added as pipeline step:

imdb_pipeline.add_model_step(embedder)
imdb_pipeline.add_model_step(smodel)

imdb_pipeline.status()

{'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}, {'ModelInference': {'models': [{'name': 'smodel-o', 'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19', 'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650'}]}}]"}

Replace a Pipeline Step

The model specified in a pipeline step can be replaced with the pipeline method replace_with_model_step(index, model).

The following parameters are used for replacing a pipeline step:

Parameter Default Value Purpose
index null The pipeline step to be replaced. Pipeline steps follow array numbering, where the first step is 0, etc.
model null The new model to be used in the pipeline step.

In the following example, a deployed pipeline will have the initial model step replaced with a new one. A status of the pipeline will be displayed after deployment and after the pipeline swap to show the model has been replaced from ccfraudoriginal to ccfraudreplacement, each with their own versions.

pipeline.deploy()

pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.2.145',
   'name': 'engine-75bfd7dc9d-7p9qk',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'hotswappipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'ccfraudoriginal',
      'version': '3a03dc94-716e-46bb-84c8-91bc99ceb2c3',
      'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.2.144',
   'name': 'engine-lb-55dcdff64c-vf74s',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

pipeline.replace_with_model_step(0, replacement_model).deploy()

pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.2.153',
   'name': 'engine-96486c95d-zfchr',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'hotswappipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'ccfraudreplacement',
      'version': '714efd19-5c83-42a8-aece-24b4ba530925',
      'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.2.154',
   'name': 'engine-lb-55dcdff64c-9np9k',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Pre and Post Processing Steps

A Pipeline Step can be more than models - they can also be pre processing and post processing steps. For example, the Demand Curve Tutorial has both a pre and post processing steps that are added to the pipeline. The preprocessing step uses the following code:

import numpy
import pandas

import json

# add interaction terms for the model
def actual_preprocess(pdata):
    pd = pdata.copy()
    # convert boolean cust_known to 0/1
    pd.cust_known = numpy.where(pd.cust_known, 1, 0)
    # interact UnitPrice and cust_known
    pd['UnitPriceXcust_known'] = pd.UnitPrice * pd.cust_known
    return pd.loc[:, ['UnitPrice', 'cust_known', 'UnitPriceXcust_known']]

# If the data is a json string, call this wrapper instead
# Expected input:
# a dictionary with fields 'colnames', 'data'

# test that the code works here
def wallaroo_json(data):
    obj = json.loads(data)
    pdata = pandas.DataFrame(obj['query'],
                             columns=obj['colnames'])
    pprocessed = actual_preprocess(pdata)
    
   # return a dictionary, with the fields the model expect
    return {
       'tensor_fields': ['model_input'],
       'model_input': pprocessed.to_numpy().tolist()
    }

It is added as a Python module by uploading it as a model:

# load the preprocess module
module_pre = wl.upload_model("preprocess", "./preprocess.py").configure('python')

And then added to the pipeline as a step:

# now make a pipeline
demandcurve_pipeline = (wl.build_pipeline("demand-curve-pipeline")
                        .add_model_step(module_pre)
                        .add_model_step(demand_curve_model)
                        .add_model_step(module_post))

Remove a Pipeline Step

To remove a step from the pipeline, use the Pipeline remove_step(index) command, where the index is the array index for the pipeline’s steps.

In the following example the pipeline imdb_pipeline will have the step with the model smodel-o removed.

imdb_pipeline.status

<bound method Pipeline.status of {'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}, {'ModelInference': {'models': [{'name': 'smodel-o', 'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19', 'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650'}]}}]"}>

imdb_pipeline.remove_step(1)
{'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}]"}

Manage Pipeline Deployment Configuration

Pipelines can be deployed to allocate more or fewer resources through the DeploymentConfig object that sets the pipeline’s autoscaling parameters.

Autoscaling allows the user to define how many engines a pipeline starts with, the minimum amount of engines a pipeline can be using, and the maximum amount of engines a pipeline can scale up to. The pipeline scales up and down based on the average CPU utilization across the engines in a given pipeline as the user’s workload increases and decreases.

This is performed through the wallaroo.DeploymentConfigBuilder method that returns a wallaroo.deployment_config.DeploymentConfig object. The DeploymentConfig is then applied to a Wallaroo pipeline when it is deployed.

The following parameters are used for auto-scaling:

Parameter Default Value Purpose
replica_count 1 Sets the initial amount of engines for the pipeline.
replica_autoscale_min_max 1 Sets the minimum number of engines and the maximum amount of engines. The maximum parameter must be set by the user.
autoscale_cpu_utilization 50 An integer representing the average CPU utilization. The default value is 50, which represents an average of 50% CPU utilization for the engines in a pipeline.

The DeploymentConfig is then built with the a build method.

The following example a DeploymentConfig will be created and saved to the variable ccfraudDeployConfig. It will set the minimum engines to 2, the maximum to 5, and use 60% of CPU utilization. This will then be applied to the deployment of the pipeline ccfraudPipeline by specifying it’s deployment_config parameter.

ccfraudDeployConfig = (wallaroo.DeploymentConfigBuilder()
    .replica_count(1)
    .replica_autoscale_min_max(minimum=2, maximum=5)
    .autoscale_cpu_utilization(60)
    .build())

ccfraudPipeline.deploy(deployment_config=ccfraudDeployConfig)

Deploy a Pipeline

When a pipeline step is added or removed, the pipeline must be deployed through the pipeline deploy(). This allocates resources to the pipeline from the Kubernetes environment and make it available to submit information to perform inferences. This process typically takes 45 seconds.

Once complete, the pipeline status() command will show 'status':'Running'.

Pipeline deployments can be modified to enable auto-scaling to allow pipelines to allocate more or fewer resources based on need by setting the pipeline’s This will then be applied to the deployment of the pipelineccfraudPipelineby specifying it'sdeployment_config` optional parameter. If this optional parameter is not passed, then the deployment will defer to default values. For more information, see Manage Pipeline Deployment Configuration.

In the following example, the pipeline imdb-pipeline that contains two steps will be deployed with default deployment configuration:

imdb_pipeline.status

<bound method Pipeline.status of {'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}, {'ModelInference': {'models': [{'name': 'smodel-o', 'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19', 'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650'}]}}]"}>

imdb_pipeline.deploy()
Waiting for deployment - this will take up to 45s ...... ok

imdb_pipeline.status()

{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.12.1.65',
   'name': 'engine-778b65459-f9mt5',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'imdb-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'embedder-o',
      'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d',
      'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4',
      'status': 'Running'},
     {'name': 'smodel-o',
      'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19',
      'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.12.1.66',
   'name': 'engine-lb-85846c64f8-ggg2t',
   'status': 'Running',
   'reason': None}]}

Troubleshooting Pipeline Deployment

If you deploy more pipelines than your environment can handle, or if you deploy more pipelines than your license allows, you may see an error like the following:


LimitError: You have reached a license limit in your Wallaroo instance. In order to add additional resources, you can remove some of your existing resources. If you have any questions contact us at community@wallaroo.ai: MAX_PIPELINES_LIMIT_EXCEEDED

Undeploy any unnecessary pipelines either through the SDK or through the Wallaroo Pipeline Dashboard, then attempt to redeploy the pipeline in question again.

Undeploy a Pipeline

When a pipeline is not currently needed, it can be undeployed and its resources turned back to the Kubernetes environment. To undeploy a pipeline, use the pipeline undeploy() command.

In this example, the aloha_pipeline will be undeployed:

aloha_pipeline.undeploy()

{'name': 'aloha-test-demo', 'create_time': datetime.datetime(2022, 3, 29, 20, 34, 3, 960957, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'aloha-2', 'version': 'a8e8abdc-c22f-416c-a13c-5fe162357430', 'sha': 'fd998cd5e4964bbbb4f8d29d245a8ac67df81b62be767afbceb96a03d1a01520'}]}}]"}

Get Pipeline Status

The pipeline status() command shows the current status, models, and other information on a pipeline.

The following example shows the pipeline imdb_pipeline status before and after it is deployed:

imdb_pipeline.status

<bound method Pipeline.status of {'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}, {'ModelInference': {'models': [{'name': 'smodel-o', 'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19', 'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650'}]}}]"}>

imdb_pipeline.deploy()
Waiting for deployment - this will take up to 45s ...... ok

imdb_pipeline.status()

{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.12.1.65',
   'name': 'engine-778b65459-f9mt5',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'imdb-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'embedder-o',
      'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d',
      'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4',
      'status': 'Running'},
     {'name': 'smodel-o',
      'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19',
      'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.12.1.66',
   'name': 'engine-lb-85846c64f8-ggg2t',
   'status': 'Running',
   'reason': None}]}

Get Pipeline Logs

Pipeline have their own set of log files that can be retrieved and analyzed as needed with the pipeline.logs(limit=100) command. This command takes the following parameters:

Parameter Type Description
limit Int Limits how many log files to display. Defaults to 100.

Typically this only shows the last 3 commands in a Python notebook for spacing purposes.

In this example, the last 50 logs to the pipeline ccfraudpipeline are requested. Only one is shown for brevity.

ccfraud_pipeline.logs(limit=50)
       
Timestamp Output Input Anomalies
2022-23-Aug 16:44:56 [array([[0.00149742]])] [[1.0678324729342086, 0.21778102664937624, -1.7115145261843976, 0.6822857209662413, 1.0138553066742804, -0.43350000129006655, 0.7395859436561657, -0.28828395953577357, -0.44726268795990787, 0.5146124987725894, 0.3791316964287545, 0.5190619748123175, -0.4904593221655364, 1.1656456468728569, -0.9776307444180006, -0.6322198962519854, -0.6891477694494687, 0.17833178574255615, 0.1397992467197424, -0.35542206494183326, 0.4394217876939808, 1.4588397511627804, -0.3886829614721505, 0.4353492889350186, 1.7420053483337177, -0.4434654615252943, -0.15157478906219238, -0.26684517248765616, -1.454961775612449]] 0

Anomaly Testing

Anomaly detection allows organizations to set validation parameters. A validation is added to a pipeline to test data based on a specific expression. If the expression is returned as False, this is detected as an anomaly and added to the InferenceResult object’s check_failures array and the pipeline logs.

Anomaly detection consists of the following steps:

  • Set a validation: Add a validation to a pipeline that, when returned False, adds an entry to the InferenceResult object’s check_failures attribute with the expression that caused the failure.
  • Display anomalies: Anomalies detected through a Pipeline’s validation attribute are displayed either through the InferenceResult object’s check_failures attribute, or through the pipeline’s logs.

Set A Validation

Validations are added to a pipeline through the wallaroo.pipeline add_validation method. The following parameters are required:

Parameter Type Description
name String (Required) The name of the validation
validation wallaroo.checks.Expression (Required) The validation expression that adds the result InferenceResult object’s check_failures attribute when expression result is False.

Validation expressions take the format value Expression, with the expression being in the form of a :py:Expression:. For example, if the model housing_model is part of the pipeline steps, then a validation expression may be housing_model.outputs[0][0] < 100.0: If the output of the housing_model inference is less than 100, then the validation is True and no action is taken. Any values over 100, the validation is False which triggers adding the anomaly to the InferenceResult object’s check_failures attribute.

Note that multiple validations can be created to allow for multiple anomalies detection.

In the following example, a validation is added to the pipeline to detect housing prices that are below 100 (represented as $100 million), and trigger an anomaly for values above that level. When an inference is performed that triggers a validation failure, the results are displayed in the InferenceResult object’s check_failures attribute.

p = wl.build_pipeline('anomaly-housing-pipeline')
p = p.add_model_step(housing_model)
p = p.add_validation('price too high', housing_model.outputs[0][0] < 100.0)
pipeline = p.deploy()

test_input = {"dense_16_input":[[0.02675675, 0.0, 0.02677953, 0.0, 0.0010046, 0.00951931, 0.14795322, 0.0027145,  2, 0.98536841, 0.02988655, 0.04031725, 0.04298041]]}
response_trigger = pipeline.infer(test_input)
print("\n")
print(response_trigger)

[InferenceResult({'check_failures': [{'False': {'expr': 'anomaly-housing.outputs[0][0] < 100'}}],
 'elapsed': 15110549,
 'model_name': 'anomaly-housing',
 'model_version': 'c3cf1577-6666-48d3-b85c-5d4a6e6567ea',
 'original_data': {'dense_16_input': [[0.02675675,
                                       0.0,
                                       0.02677953,
                                       0.0,
                                       0.0010046,
                                       0.00951931,
                                       0.14795322,
                                       0.0027145,
                                       2,
                                       0.98536841,
                                       0.02988655,
                                       0.04031725,
                                       0.04298041]]},
 'outputs': [{'Float': {'data': [350.46990966796875], 'dim': [1, 1], 'v': 1}}],
 'pipeline_name': 'anomaly-housing-model',
 'time': 1651257043312})]

Display Anomalies

Anomalies detected through a Pipeline’s validation attribute are displayed either through the InferenceResult object’s check_failures attribute, or through the pipeline’s logs.

To display an anomaly through the InferenceResult object, display the check_failures attribute.

In the following example, the an InferenceResult where the validation failed will display the failure in the check_failures attribute:

test_input = {"dense_16_input":[[0.02675675, 0.0, 0.02677953, 0.0, 0.0010046, 0.00951931, 0.14795322, 0.0027145,  2, 0.98536841, 0.02988655, 0.04031725, 0.04298041]]}
response_trigger = pipeline.infer(test_input)
print("\n")
print(response_trigger)

[InferenceResult({'check_failures': [{'False': {'expr': 'anomaly-housing-model.outputs[0][0] < '
                                       '100'}}],
 'elapsed': 12196540,
 'model_name': 'anomaly-housing-model',
 'model_version': 'a3b1c29f-c827-4aad-817d-485de464d59b',
 'original_data': {'dense_16_input': [[0.02675675,
                                       0.0,
                                       0.02677953,
                                       0.0,
                                       0.0010046,
                                       0.00951931,
                                       0.14795322,
                                       0.0027145,
                                       2,
                                       0.98536841,
                                       0.02988655,
                                       0.04031725,
                                       0.04298041]]},
 'outputs': [{'Float': {'data': [350.46990966796875], 'dim': [1, 1], 'v': 1}}],
 'pipeline_name': 'anomaly-housing-pipeline',
 'shadow_data': {},
 'time': 1667416852255})]

The other methods is to use the pipeline.logs() method with the parameter valid=False, isolating the logs where the validation was returned as False.

In this example, a set of logs where the validation returned as False will be displayed:

pipeline.logs(valid=False)
Timestamp Output Input Anomalies
2022-02-Nov 19:20:52 [array([[350.46990967]])] [[0.02675675, 0.0, 0.02677953, 0.0, 0.0010046, 0.00951931, 0.14795322, 0.0027145, 2, 0.98536841, 0.02988655, 0.04031725, 0.04298041]] 1

A/B Testing

A/B testing is a method that provides the ability to test competing ML models for performance, accuracy or other useful benchmarks. Different models are added to the same pipeline steps as follows:

  • Control or Champion model: The model currently used for inferences.
  • Challenger model(s): The model or set of models compared to the challenger model.

A/B testing splits a portion of the inference requests between the champion model and the one or more challengers through the add_random_split method. This method splits the inferences submitted to the model through a randomly weighted step.

Each model receives inputs that are approximately proportional to the weight it is assigned. For example, with two models having weights 1 and 1, each will receive roughly equal amounts of inference inputs. If the weights were changed to 1 and 2, the models would receive roughly 33% and 66% respectively instead.

When choosing the model to use, a random number between 0.0 and 1.0 is generated. The weighted inputs are mapped to that range, and the random input is then used to select the model to use. For example, for the two-models equal-weight case, a random key of 0.4 would route to the first model, 0.6 would route to the second.

Add Random Split

A random split step can be added to a pipeline through the add_random_split method.

The following parameters are used when adding a random split step to a pipeline:

Parameter Type Description
champion_weight Float (Required) The weight for the champion model.
champion_model Wallaroo.Model (Required) The uploaded champion model.
challenger_weight Float (Required) The weight of the challenger model.
challenger_model Wallaroo.Model (Required) The uploaded challenger model.
hash_key String(Optional) A key used instead of a random number for model selection. This must be between 0.0 and 1.0.

Note that multiple challenger models with different weights can be added as the random split step.

add_random_split([(champion_weight, champion_model), (challenger_weight, challenger_model),  (challenger_weight2, challenger_model2),...], hash_key)

In this example, a pipeline will be built with a 1:2 weighted ratio between the champion and a single challenger model.

pipeline = (wl.build_pipeline("randomsplitpipeline-demo")
            .add_random_split([(2, control), (1, challenger)]))

The results for a series of single are displayed to show the random weighted split between the two models in action:

results = []
results.append(experiment_pipeline.infer_from_file("data/data-1.json"))
results.append(experiment_pipeline.infer_from_file("data/data-1.json"))
results.append(experiment_pipeline.infer_from_file("data/data-1.json"))
results.append(experiment_pipeline.infer_from_file("data/data-1.json"))
results.append(experiment_pipeline.infer_from_file("data/data-1.json"))

for result in results:
    print(result[0].model())
    print(result[0].data())

('aloha-control', 'ff81f634-8fb4-4a62-b873-93b02eb86ab4')
[array([[0.00151959]]), array([[0.98291481]]), array([[0.01209957]]), array([[4.75912966e-05]]), array([[2.02893716e-05]]), array([[0.00031977]]), array([[0.01102928]]), array([[0.99756402]]), array([[0.01034162]]), array([[0.00803896]]), array([[0.01615506]]), array([[0.00623623]]), array([[0.00099858]]), array([[1.79337805e-26]]), array([[1.38899512e-27]])]

('aloha-control', 'ff81f634-8fb4-4a62-b873-93b02eb86ab4')
[array([[0.00151959]]), array([[0.98291481]]), array([[0.01209957]]), array([[4.75912966e-05]]), array([[2.02893716e-05]]), array([[0.00031977]]), array([[0.01102928]]), array([[0.99756402]]), array([[0.01034162]]), array([[0.00803896]]), array([[0.01615506]]), array([[0.00623623]]), array([[0.00099858]]), array([[1.79337805e-26]]), array([[1.38899512e-27]])]

('aloha-challenger', '87fdfe08-170e-4231-a0b9-543728d6fc57')
[array([[0.00151959]]), array([[0.98291481]]), array([[0.01209957]]), array([[4.75912966e-05]]), array([[2.02893716e-05]]), array([[0.00031977]]), array([[0.01102928]]), array([[0.99756402]]), array([[0.01034162]]), array([[0.00803896]]), array([[0.01615506]]), array([[0.00623623]]), array([[0.00099858]]), array([[1.79337805e-26]]), array([[1.38899512e-27]])]

('aloha-challenger', '87fdfe08-170e-4231-a0b9-543728d6fc57')
[array([[0.00151959]]), array([[0.98291481]]), array([[0.01209957]]), array([[4.75912966e-05]]), array([[2.02893716e-05]]), array([[0.00031977]]), array([[0.01102928]]), array([[0.99756402]]), array([[0.01034162]]), array([[0.00803896]]), array([[0.01615506]]), array([[0.00623623]]), array([[0.00099858]]), array([[1.79337805e-26]]), array([[1.38899512e-27]])]

('aloha-challenger', '87fdfe08-170e-4231-a0b9-543728d6fc57')
[array([[0.00151959]]), array([[0.98291481]]), array([[0.01209957]]), array([[4.75912966e-05]]), array([[2.02893716e-05]]), array([[0.00031977]]), array([[0.01102928]]), array([[0.99756402]]), array([[0.01034162]]), array([[0.00803896]]), array([[0.01615506]]), array([[0.00623623]]), array([[0.00099858]]), array([[1.79337805e-26]]), array([[1.38899512e-27]])]

Replace With Random Split

If a pipeline already had steps as detailed in Add a Step to a Pipeline, this step can be replaced with a random split with the replace_with_random_split method.

The following parameters are used when adding a random split step to a pipeline:

Parameter Type Description
index Integer (Required) The pipeline step being replaced.
champion_weight Float (Required) The weight for the champion model.
champion_model Wallaroo.Model (Required) The uploaded champion model.
**challenger_weight Float (Required) The weight of the challenger model.
challenger_model Wallaroo.Model (Required) The uploaded challenger model.
hash_key String(Optional) A key used instead of a random number for model selection. This must be between 0.0 and 1.0.

Note that one or more challenger models can be added for the random split step:

replace_with_random_split(index, [(champion_weight, champion_model), (challenger_weight, challenger_model)], (challenger_weight2, challenger_model2),...], hash_key)

A/B Testing Logs

Arrow Enabled A/B Testing Logs

For Arrow enabled environments, logs with the model names used are displayed with the Pipeline.logs method, located in the column out._model_split.

logs = experiment_pipeline.logs(limit=5)
display(logs.loc[:,['time', 'out._model_split', 'out.main']])
time out._model_split out.main
0 2023-03-03 19:08:35.653 [{“name”:“aloha-control”,“version”:“89389786-0c17-4214-938c-aa22dd28359f”,“sha”:“fd998cd5e4964bbbb4f8d29d245a8ac67df81b62be767afbceb96a03d1a01520”}] [0.9999754]
1 2023-03-03 19:08:35.702 [{“name”:“aloha-challenger”,“version”:“3acd3835-be72-42c4-bcae-84368f416998”,“sha”:“223d26869d24976942f53ccb40b432e8b7c39f9ffcf1f719f3929d7595bceaf3”}] [0.9999727]
2 2023-03-03 19:08:35.753 [{“name”:“aloha-challenger”,“version”:“3acd3835-be72-42c4-bcae-84368f416998”,“sha”:“223d26869d24976942f53ccb40b432e8b7c39f9ffcf1f719f3929d7595bceaf3”}] [0.6606688]
3 2023-03-03 19:08:35.799 [{“name”:“aloha-control”,“version”:“89389786-0c17-4214-938c-aa22dd28359f”,“sha”:“fd998cd5e4964bbbb4f8d29d245a8ac67df81b62be767afbceb96a03d1a01520”}] [0.9998954]
4 2023-03-03 19:08:35.846 [{“name”:“aloha-control”,“version”:“89389786-0c17-4214-938c-aa22dd28359f”,“sha”:“fd998cd5e4964bbbb4f8d29d245a8ac67df81b62be767afbceb96a03d1a01520”}] [0.99999803]

Pipeline Shadow Deployments

Wallaroo provides a method of testing the same data against two different models or sets of models at the same time through shadow deployments otherwise known as parallel deployments or A/B test. This allows data to be submitted to a pipeline with inferences running on several different sets of models. Typically this is performed on a model that is known to provide accurate results - the champion - and a model or set of models that is being tested to see if it provides more accurate or faster responses depending on the criteria known as the challenger(s). Multiple challengers can be tested against a single champion to determine which is “better” based on the organization’s criteria.

As described in the Wallaroo blog post The What, Why, and How of Model A/B Testing:

In data science, A/B tests can also be used to choose between two models in production, by measuring which model performs better in the real world. In this formulation, the control is often an existing model that is currently in production, sometimes called the champion. The treatment is a new model being considered to replace the old one. This new model is sometimes called the challenger…. Keep in mind that in machine learning, the terms experiments and trials also often refer to the process of finding a training configuration that works best for the problem at hand (this is sometimes called hyperparameter optimization).

When a shadow deployment is created, only the inference from the champion is returned in the InferenceResult Object data, while the result data for the shadow deployments is stored in the InferenceResult Object shadow_data.

Create Shadow Deployment

Create a parallel or shadow deployment for a pipeline with the pipeline.add_shadow_deploy(champion, challengers[]) method, where the champion is a Wallaroo Model object, and challengers[] is one or more Wallaroo Model objects.

Each inference request sent to the pipeline is sent to all the models. The prediction from the champion is returned by the pipeline, while the predictions from the challengers are not part of the standard output, but are kept stored in the shadow_data attribute and in the logs for later comparison.

In this example, a shadow deployment is created with the champion versus two challenger models.

champion = wl.upload_model(champion_model_name, champion_model_file).configure()
model2 = wl.upload_model(shadow_model_01_name, shadow_model_01_file).configure()
model3 = wl.upload_model(shadow_model_02_name, shadow_model_02_file).configure()
   
pipeline.add_shadow_deploy(champion, [model2, model3])
pipeline.deploy()
   
name cc-shadow
created 2022-08-04 20:06:55.102203+00:00
last_updated 2022-08-04 20:37:28.785947+00:00
deployed True
tags
steps ccfraud-lstm

Arrow Enabled Shadow Deploy Outputs

For Arrow enabled Wallaroo instances the model outputs are listed by column. The output data is set by the term out, followed by the name of the model. For the default model, this is out.dense_1, while the shadow deployed models are in the format out_{model name}.variable, where {model name} is the name of the shadow deployed model.

sample_data_file = './smoke_test.df.json'
response = pipeline.infer_from_file(sample_data_file)
time in.tensor out.dense_1 check_failures out_ccfraudrf.variable out_ccfraudxgb.variable
0 2023-03-03 17:35:28.859 [1.0678324729, 0.2177810266, -1.7115145262, 0.682285721, 1.0138553067, -0.4335000013, 0.7395859437, -0.2882839595, -0.447262688, 0.5146124988, 0.3791316964, 0.5190619748, -0.4904593222, 1.1656456469, -0.9776307444, -0.6322198963, -0.6891477694, 0.1783317857, 0.1397992467, -0.3554220649, 0.4394217877, 1.4588397512, -0.3886829615, 0.4353492889, 1.7420053483, -0.4434654615, -0.1515747891, -0.2668451725, -1.4549617756] [0.0014974177] 0 [1.0] [0.0005066991]

Arrow Disabled Shadow Deploy Outputs

For Arrow disabled environments, the output is from the Wallaroo InferenceResult object.

Running an inference in a pipeline that has shadow deployments enabled will have the inference run through the champion model returned in the InferenceResult Object’s data element, while the challengers is returned in the InferenceResult Object’s shadow_data element:

pipeline.infer_from_file(sample_data_file)

[InferenceResult({'check_failures': [],
  'elapsed': 125102,
  'model_name': 'ccfraud-lstm',
  'model_version': '6b650c9c-e22f-4c50-97b2-7fce07f18607',
  'original_data': {'tensor': [[1.0678324729342086,
                                0.21778102664937624,
                                -1.7115145261843976,
                                0.6822857209662413,
                                1.0138553066742804,
                                -0.43350000129006655,
                                0.7395859436561657,
                                -0.28828395953577357,
                                -0.44726268795990787,
                                0.5146124987725894,
                                0.3791316964287545,
                                0.5190619748123175,
                                -0.4904593221655364,
                                1.1656456468728569,
                                -0.9776307444180006,
                                -0.6322198962519854,
                                -0.6891477694494687,
                                0.17833178574255615,
                                0.1397992467197424,
                                -0.35542206494183326,
                                0.4394217876939808,
                                1.4588397511627804,
                                -0.3886829614721505,
                                0.4353492889350186,
                                1.7420053483337177,
                                -0.4434654615252943,
                                -0.15157478906219238,
                                -0.26684517248765616,
                                -1.454961775612449]]},
  'outputs': [{'Float': {'data': [0.001497417688369751],
                         'dim': [1, 1],
                         'v': 1}}],
  'pipeline_name': 'cc-shadow',
  'shadow_data': {'ccfraud-rf': [{'Float': {'data': [1.0],
                                            'dim': [1, 1],
                                            'v': 1}}],
                  'ccfraud-xgb': [{'Float': {'data': [0.0005066990852355957],
                                             'dim': [1, 1],
                                             'v': 1}}]},
  'time': 1659645473965})]

Retrieve Shadow Deployment Logs

Arrow Enabled Shadow Deploy Logs

For Arrow enabled Wallaroo instances the shadow deploy results are part of the Pipeline.logs() method. The output data is set by the term out, followed by the name of the model. For the default model, this is out.dense_1, while the shadow deployed models are in the format out_{model name}.variable, where {model name} is the name of the shadow deployed model.

logs = pipeline.logs()
display(logs)
time in.tensor out.dense_1 check_failures out_ccfraudrf.variable out_ccfraudxgb.variable
0 2023-03-03 17:35:28.859 [1.0678324729, 0.2177810266, -1.7115145262, 0.682285721, 1.0138553067, -0.4335000013, 0.7395859437, -0.2882839595, -0.447262688, 0.5146124988, 0.3791316964, 0.5190619748, -0.4904593222, 1.1656456469, -0.9776307444, -0.6322198963, -0.6891477694, 0.1783317857, 0.1397992467, -0.3554220649, 0.4394217877, 1.4588397512, -0.3886829615, 0.4353492889, 1.7420053483, -0.4434654615, -0.1515747891, -0.2668451725, -1.4549617756] [0.0014974177] 0 [1.0] [0.0005066991]
Arrow Disabled Shadow Deploy Logs

Inferences run against a pipeline that has shadow deployments (also known as a parallel deployment) will not be visible in the pipeline logs. To view the results against the challenger models of a shadow deployment, use the Pipeline.logs_shadow_deploy() method. The results will be grouped by inputs, allowing evaluation against multiple models performance based on the same data.

In this example, the Shadow Deployment logs are retrieved after an inference.

logs = pipeline.logs_shadow_deploy()
logs
   
Input [[1.0678324729342086, 0.21778102664937624, -1.7115145261843976, 0.6822857209662413, 1.0138553066742804, -0.43350000129006655, 0.7395859436561657, -0.28828395953577357, -0.44726268795990787, 0.5146124987725894, 0.3791316964287545, 0.5190619748123175, -0.4904593221655364, 1.1656456468728569, -0.9776307444180006, -0.6322198962519854, -0.6891477694494687, 0.17833178574255615, 0.1397992467197424, -0.35542206494183326, 0.4394217876939808, 1.4588397511627804, -0.3886829614721505, 0.4353492889350186, 1.7420053483337177, -0.4434654615252943, -0.15157478906219238, -0.26684517248765616, -1.454961775612449]]
           
Model Type Model Name Output Timestamp Model Version Elapsed
Primary ccfraud-lstm [array([[0.00149742]])] 2022-08-04T20:37:53.965000 6b650c9c-e22f-4c50-97b2-7fce07f18607 125102
Challenger ccfraud-rf [{‘Float’: {‘v’: 1, ‘dim’: [1, 1], ‘data’: [1.0]}}]
Challenger ccfraud-xgb [{‘Float’: {‘v’: 1, ‘dim’: [1, 1], ‘data’: [0.0005066990852355957]}}]

Get Pipeline URL Endpoint

The Pipeline URL Endpoint or the Pipeline Deploy URL is used to submit data to a pipeline to use for an inference. This is done through the pipeline _deployment._url() method.

In this example, the pipeline URL endpoint for the pipeline ccfraud_pipeline will be displayed:

ccfraud_pipeline._deployment._url()

'http://engine-lb.ccfraud-pipeline-1:29502/pipelines/ccfraud-pipeline'

2.2.6 - Wallaroo SDK Essentials Guide: Tag Management

How to create and manage Wallaroo Tags through the Wallaroo SDK

Wallaroo SDK Tag Management

Tags are applied to either model versions or pipelines. This allows organizations to track different versions of models, and search for what pipelines have been used for specific purposes such as testing versus production use.

Create Tag

Tags are created with the Wallaroo client command create_tag(String tagname). This creates the tag and makes it available for use.

The tag will be saved to the variable currentTag to be used in the rest of these examples.

# Now we create our tag
currentTag = wl.create_tag("My Great Tag")

List Tags

Tags are listed with the Wallaroo client command list_tags(), which shows all tags and what models and pipelines they have been assigned to.

# List all tags

wl.list_tags()
idtagmodelspipelines
1My Great Tag[('tagtestmodel', ['70169e97-fb7e-4922-82ba-4f5d37e75253'])][]

Wallaroo Pipeline Tag Management

Tags are used with pipelines to track different pipelines that are built or deployed with different features or functions.

Add Tag to Pipeline

Tags are added to a pipeline through the Wallaroo Tag add_to_pipeline(pipeline_id) method, where pipeline_id is the pipeline’s integer id.

For this example, we will add currentTag to testtest_pipeline, then verify it has been added through the list_tags command and list_pipelines command.

# add this tag to the pipeline
currentTag.add_to_pipeline(tagtest_pipeline.id())
{'pipeline_pk_id': 1, 'tag_pk_id': 1}

Search Pipelines by Tag

Pipelines can be searched through the Wallaroo Client search_pipelines(search_term) method, where search_term is a string value for tags assigned to the pipelines.

In this example, the text “My Great Tag” that corresponds to currentTag will be searched for and displayed.

wl.search_pipelines('My Great Tag')
nameversioncreation_timelast_updated_timedeployedtagssteps
tagtestpipeline5a4ff3c7-1a2d-4b0a-ad9f-78941e6f56772022-29-Nov 17:15:212022-29-Nov 17:15:21(unknown)My Great Tag

Remove Tag from Pipeline

Tags are removed from a pipeline with the Wallaroo Tag remove_from_pipeline(pipeline_id) command, where pipeline_id is the integer value of the pipeline’s id.

For this example, currentTag will be removed from tagtest_pipeline. This will be verified through the list_tags and search_pipelines command.

## remove from pipeline
currentTag.remove_from_pipeline(tagtest_pipeline.id())
{'pipeline_pk_id': 1, 'tag_pk_id': 1}

Wallaroo Model Tag Management

Tags are used with models to track differences in model versions.

Assign Tag to a Model

Tags are assigned to a model through the Wallaroo Tag add_to_model(model_id) command, where model_id is the model’s numerical ID number. The tag is applied to the most current version of the model.

For this example, the currentTag will be applied to the tagtest_model. All tags will then be listed to show it has been assigned to this model.

# add tag to model

currentTag.add_to_model(tagtest_model.id())
{'model_id': 1, 'tag_id': 1}

Search Models by Tag

Model versions can be searched via tags using the Wallaroo Client method search_models(search_term), where search_term is a string value. All models versions containing the tag will be displayed. In this example, we will be using the text from our tag to list all models that have the text from currentTag in them.

# Search models by tag

wl.search_models('My Great Tag')
nameversionfile_nameimage_pathlast_update_time
tagtestmodel 70169e97-fb7e-4922-82ba-4f5d37e75253 ccfraud.onnx None 2022-11-29 17:15:21.703465+00:00

Remove Tag from Model

Tags are removed from models using the Wallaroo Tag remove_from_model(model_id) command.

In this example, the currentTag will be removed from tagtest_model. A list of all tags will be shown with the list_tags command, followed by searching the models for the tag to verify it has been removed.

### remove tag from model

currentTag.remove_from_model(tagtest_model.id())
{'model_id': 1, 'tag_id': 1}

2.2.7 - Wallaroo SDK Essentials Guide: Inferencing

How to use Wallaroo for model inferencing through the Wallaroo SDK

Inferencing

Once a pipeline has been deployed, an inference can be run. This will submit data to the pipeline, where it is processed through each of the pipeline’s steps, with the output of the previous step providing the input for the new step. The final step will then output the result of all of the pipeline’s steps.

The input sent and the output received depends on whether Arrow support is enabled in the Wallaroo instance.

  • For Arrow enabled instances of Wallaroo:
  • For Arrow disabled instances of Wallaroo:
    • Inputs are submitted in the proprietary Wallaroo JSON format.
    • Outputs are inferences are returned as the InferenceResult object.

Run Inference through Local Variable

Arrow Enabled Infer

When Arrow support is enabled, the method pipeline infer(data, timeout, dataset, dataset_exclude, dataset_separator) performs an inference as defined by the pipeline steps and takes the following arguments:

  • data (REQUIRED): The data submitted to the pipeline for inference. Inputs are either sent as a pandas.DataFrame or an Apache Arrow.
  • timeout (OPTIONAL): A timeout in seconds before the inference throws an exception. The default is 15 second per call to accommodate large, complex models. Note that for a batch inference, this is per call - with 10 inference requests, each would have a default timeout of 15 seconds.
  • dataset (OPTIONAL): The datasets to be returned. By default this is set to ["*"] which returns, [“time”, “in”, “out”, “check_failures”].
  • dataset_exclude (OPTIONAL): Allows users to exclude parts of the dataset.
  • dataset_separator (OPTIONAL): Allows other types of dataset separators to be used. If set to “.”, the returned dataset will be flattened.

The following example is an inference request using a pandas DataFrame, and the returning values. Note that columns are labeled based on the inputs and outputs. This model only has one output - dense_1, which is listed in the out.dense_1 column. If the model had returned multiple outputs, they would be listed as out.output1, out.output2, etc.

result = ccfraud_pipeline.infer(high_fraud_data)
  time in.tensor out.dense_1 check_failures
0 2023-02-15 23:07:07.570 [1.0678324729, 18.1555563975, -1.6589551058, 5.2111788045, 2.3452470645, 10.4670835778, 5.0925820522, 12.8295153637, 4.9536770468, 2.3934736228, 23.912131818, 1.759956831, 0.8561037518, 1.1656456469, 0.5395988814, 0.7784221343, 6.7580610727, 3.9274118477, 12.4621782767, 12.3075382165, 13.7879519066, 1.4588397512, 3.6818346868, 1.753914366, 8.4843550037, 14.6454097667, 26.8523774363, 2.7165292377, 3.0611957069] [0.981199] 0

Arrow Disabled Infer

When Arrow support is disabled, the method pipeline infer(data, timeout) performs an inference as defined by the pipeline steps and takes the following arguments:

  • data (REQUIRED): The data submitted to the pipeline for inference in the Wallaroo JSON format.
  • timeout (OPTIONAL): A timeout in seconds before the inference throws an exception. The default is 15 second per call to accommodate large, complex models. Note that for a batch inference, this is per call - with 10 inference requests, each would have a default timeout of 15 seconds.

The following example shows running an inference on data with a timeout of 20 seconds:

inferences = deployment.infer(high_fraud_data, timeout=20)
display(inferences)
[InferenceResult({'check_failures': [],
  'elapsed': 144533,
  'model_name': 'ktoqccfraudmodel',
  'model_version': '08daeb67-e4c4-42a9-84af-bac7bcc9e18b',
  'original_data': {'tensor': [[1.0678324729342086,
                                18.155556397512136,
                                -1.658955105843852,
                                5.2111788045436445,
                                2.345247064454334,
                                10.467083577773014,
                                5.0925820522419745,
                                12.82951536371218,
                                4.953677046849403,
                                2.3934736228338225,
                                23.912131817957253,
                                1.7599568310350209,
                                0.8561037518143335,
                                1.1656456468728569,
                                0.5395988813934498,
                                0.7784221343010385,
                                6.75806107274245,
                                3.927411847659908,
                                12.462178276650056,
                                12.307538216518656,
                                13.787951906620115,
                                1.4588397511627804,
                                3.681834686805714,
                                1.753914366037974,
                                8.484355003656184,
                                14.6454097666836,
                                26.852377436250144,
                                2.716529237720336,
                                3.061195706890285]]},
  'outputs': [{'Float': {'data': [0.9811990261077881],
                         'dim': [1, 1],
                         'dtype': 'Float',
                         'v': 1}}],
  'pipeline_name': 'ktoqccfraudpipeline',
  'shadow_data': {},
  'time': 1676502293246})]

Run Inference through Pipeline Deployment URL

The method pipeline _deployment._url() provides a URL where information can be submitted through HTTP POST in JSON format to the pipeline to perform an inference. This is useful in providing a resource where information can be submitted to the pipeline from different sources to the same pipeline remotely.

  • For Arrow enabled instances of Wallaroo:
    • For DataFrame formatted JSON, the Content-Type is application/json; format=pandas-records.
    • For Arrow binary files, the Content-Type is application/vnd.apache.arrow.file.
  • For Arrow disabled instances of Wallaroo:
    • The Content-Type is application/json.

In this example, the aloha_pipeline’s deployment URL will be determined in an Arrow enabled Wallaroo instance. An inference will then be made on data submitted to the aloha_pipeline through its deployment URL via a curl HTTP POST command.

  • IMPORTANT NOTE: The _deployment._url() method will return an internal URL when using Python commands from within the Wallaroo instance - for example, the Wallaroo JupyterHub service. When connecting via an external connection, _deployment._url() returns an external URL. External URL connections requires the authentication be included in the HTTP request, and that Model Endpoints Guide external endpoints are enabled in the Wallaroo configuration options.
aloha_pipeline._deployment._url()

'http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo'
!curl -X POST http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo -H "application/json; format=pandas-records" --data @data-25k.json > curl_response.txt

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.9M  100 10.1M  100 2886k   539k   149k  0:00:19  0:00:19 --:--:-- 2570k

Run Inference From A File

To submit a data file directly to a pipeline, use the pipeline infer_from_file({Data File}, timeout).

Arrow Enabled infer_from_file

When Arrow support is enabled, the method pipeline infer_from_file(data, timeout, dataset, dataset_exclude, dataset_separator) performs an inference as defined by the pipeline steps and takes the following arguments:

  • data (REQUIRED): The name of the file submitted to the pipeline for inference. Inputs are either sent as a pandas.DataFrame or an Apache Arrow.
  • timeout (OPTIONAL): A timeout in seconds before the inference throws an exception. The default is 15 second per call to accommodate large, complex models. Note that for a batch inference, this is per call - with 10 inference requests, each would have a default timeout of 15 seconds.
  • dataset (OPTIONAL): The datasets to be returned. By default this is set to ["*"] which returns, [“time”, “in”, “out”, “check_failures”].
  • dataset_exclude (OPTIONAL): Allows users to exclude parts of the dataset.
  • dataset_separator (OPTIONAL): Allows other types of dataset separators to be used. If set to “.”, the returned dataset will be flattened.

In this example, an inference will be submitted to the ccfraud_pipeline with the file smoke_test.df.json, a DataFrame formatted JSON file.

result = ccfraud_pipeline.infer_from_file('./data/smoke_test.df.json')
  time in.tensor out.dense_1 check_failures
0 2023-02-15 23:07:07.497 [1.0678324729, 0.2177810266, -1.7115145262, 0.682285721, 1.0138553067, -0.4335000013, 0.7395859437, -0.2882839595, -0.447262688, 0.5146124988, 0.3791316964, 0.5190619748, -0.4904593222, 1.1656456469, -0.9776307444, -0.6322198963, -0.6891477694, 0.1783317857, 0.1397992467, -0.3554220649, 0.4394217877, 1.4588397512, -0.3886829615, 0.4353492889, 1.7420053483, -0.4434654615, -0.1515747891, -0.2668451725, -1.4549617756] [0.0014974177] 0

Arrow Disabled infer_from_file

When Arrow support is not enabled, the method pipeline `infer_from_file(filename, timeout) performs an inference as defined by the pipeline steps and takes the following arguments:

  • {Data File} : {Data File} is the path name to the submitted file in the Wallaroo JSON format.
  • timeout: A timeout in seconds before the inference throws an exception. The default is 15 second per call to accommodate large, complex models. Note that for a batch inference, this is per call - with 10 inference requests, each would have a default timeout of 15 seconds.

In this example, an inference will be submitted to the aloha_pipeline with the file data-1.json with a timeout of 20 seconds:

aloha_pipeline.infer_from_file("data-1.json", timeout=20)

[InferenceResult({'check_failures': [],
  'elapsed': 329803334,
  'model_name': 'aloha-2',
  'model_version': '3dc9b7f9-faff-40cc-b1b6-7724edf11b12',
  'original_data': {'text_input': [[0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    28,
                                    16,
                                    32,
                                    23,
                                    29,
                                    32,
                                    30,
                                    19,
                                    26,
                                    17]]},
  'outputs': [{'Float': {'data': [0.001519620418548584], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.9829147458076477], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.012099534273147583], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [4.7593468480044976e-05],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [2.0289742678869516e-05],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [0.0003197789192199707],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [0.011029303073883057], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.9975639581680298], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.010341644287109375], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.008038878440856934], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.016155093908309937], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.006236225366592407], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.0009985864162445068],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [1.7933435344117743e-26],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [1.388984431455466e-27],
                         'dim': [1, 1],
                         'v': 1}}],
  'pipeline_name': 'aloha-test-demo',
  'time': 1648744282452})]

InferenceResult Object

The inferences from infer and infer_from_file return the following:

  • Arrow support enabled Wallaroo instances return either:
    • If the input is a pandas DataFrame, then the inference result is a pandas DataFrame.
    • If the input is a or an Apache Arrow, then the inference result is an Apache Arrow.
  • Arrow support is not enabled Wallaroo instances return the the Wallaroo InferenceResult object.

The InferenceResult object includes the following methods:

  • data() : The data resulting from the inference.

    In this example, an inference will be submitted to the ccfraud_pipeline with the file cc_data_1k.json, with only the data displayed:

    output = ccfraud_pipeline.infer_from_file('./cc_data_1k.json')
    output[0].data()
    
    [array([[9.93003249e-01],
          [9.93003249e-01],
          [9.93003249e-01],
          ...,
          [1.10703707e-03],
          [8.53300095e-04],
          [1.24984980e-03]])]
    
  • input_data(): Returns the data provided to the pipeline to run the inference.

    In this example, an inference will be submitted to the ccfraud_pipeline with the file cc_data_1k.json, with only the first element in the array returned:

    output = ccfraud_pipeline.infer_from_file('./cc_data_1k.json')
    
    output[0].input_data()["tensor"][0]
    
    [-1.060329750089797,
    2.354496709462385,
    -3.563878832646437,
    5.138734892618555,
    -1.23084570186641,
    -0.7687824607744093,
    -3.588122810891446,
    1.888083766259287,
    -3.2789674273886593,
    -3.956325455353324,
    4.099343911805088,
    -5.653917639476211,
    -0.8775733373342495,
    -9.131571191990632,
    -0.6093537872620682,
    -3.748027677256424,
    -5.030912501659983,
    -0.8748149525506821,
    1.9870535692026476,
    0.7005485718467245,
    0.9204422758154284,
    -0.10414918089758483,
    0.3229564351284999,
    -0.7418141656910608,
    0.03841201586730117,
    1.099343914614657,
    1.2603409755785089,
    -0.14662447391576958,
    -1.446321243938815]
    

2.2.8 - Wallaroo SDK Essentials Guide: Assays Management

How to create and manage Wallaroo Assays through the Wallaroo SDK

Model Insights and Interactive Analysis Introduction

Wallaroo provides the ability to perform interactive analysis so organizations can explore the data from a pipeline and learn how the data is behaving. With this information and the knowledge of your particular business use case you can then choose appropriate thresholds for persistent automatic assays as desired.

  • IMPORTANT NOTE

    Model insights operates over time and is difficult to demo in a notebook without pre-canned data. We assume you have an active pipeline that has been running and making predictions over time and show you the code you may use to analyze your pipeline.

Monitoring tasks called assays monitors a model’s predictions or the data coming into the model against an established baseline. Changes in the distribution of this data can be an indication of model drift, or of a change in the environment that the model trained for. This can provide tips on whether a model needs to be retrained or the environment data analyzed for accuracy or other needs.

Assay Details

Assays contain the following attributes:

Attribute Default Description
Name   The name of the assay. Assay names must be unique.
Baseline Data   Data that is known to be “typical” (typically distributed) and can be used to determine whether the distribution of new data has changed.
Schedule Every 24 hours at 1 AM New assays are configured to run a new analysis for every 24 hours starting at the end of the baseline period. This period can be configured through the SDK.
Group Results Daily Groups assay results into groups based on either Daily (the default), Weekly, or Monthly.
Metric PSI Population Stability Index (PSI) is an entropy-based measure of the difference between distributions. Maximum Difference of Bins measures the maximum difference between the baseline and current distributions (as estimated using the bins). Sum of the difference of bins sums up the difference of occurrences in each bin between the baseline and current distributions.
Threshold 0.1 The threshold for deciding whether the difference between distributions, as evaluated by the above metric, is large (the distributions are different) or small (the distributions are similar). The default of 0.1 is generally a good threshold when using PSI as the metric.
Number of Bins 5 Sets the number of bins that will be used to partition the baseline data for comparison against how future data falls into these bins. By default, the binning scheme is percentile (quantile) based. The binning scheme can be configured (see Bin Mode, below). Note that the total number of bins will include the set number plus the left_outlier and the right_outlier, so the total number of bins will be the total set + 2.
Bin Mode Quantile Set the binning scheme. Quantile binning defines the bins using percentile ranges (each bin holds the same percentage of the baseline data). Equal binning defines the bins using equally spaced data value ranges, like a histogram. Custom allows users to set the range of values for each bin, with the Left Outlier always starting at Min (below the minimum values detected from the baseline) and the Right Outlier always ending at Max (above the maximum values detected from the baseline).
Bin Weight Equally Weighted The bin weights can be either set to Equally Weighted (the default) where each bin is weighted equally, or Custom where the bin weights can be adjusted depending on which are considered more important for detecting model drift.

Manage Assays via the Wallaroo SDK

List Assays

Assays are listed through the client.list_assays method, and returns a List object.

The following example shows how to list assays:

wl.list_assays()
name active status warning_threshold alert_threshold pipeline_name
Sample Assay 03 True {“run_at”: “2022-12-13T21:08:12.289359005+00:00”, “num_ok”: 16, “num_warnings”: 0, “num_alerts”: 14 None 0.1 housepricepipe
Sample Assay 02 True {“run_at”: “2022-12-13T17:34:31.148302668+00:00”, “num_ok”: 15, “num_warnings”: 0, “num_alerts”: 14} None 0.1 housepricepipe
Sample Assay 01 True {“run_at”: “2022-12-13T17:30:18.779095344+00:00”, “num_ok”: 16, “num_warnings”: 0, “num_alerts”: 14} None 0.1 housepricepipe

Build Assay Via the Wallaroo SDK

Assays are built with the Wallaroo client.build_assay(assayName, pipeline, modelName, baselineStart, baselineEnd), and returns the wallaroo.assay_config.AssayBuilder. The method requires the following parameters:

Parameter Type Description
assayName String The human friendly name of the created assay.
pipeline Wallaroo.pipeline The pipeline the assay is assigned to.
modelName String The model to perform the assay on.
baselineStart DateTime When to start the baseline period.
baselineStart DateTime When to end the baseline period.

When called, this method will then pool the pipeline between the baseline start and end periods to establish what values are considered normal outputs for the specified model.

Assays by default will run a new a new analysis every 24 hours starting at the end of the baseline period, using a 24 hour observation window.

In this example, an assay will be created named example assay and stored into the variable assay_builder.

import datetime
baseline_start = datetime.datetime.fromisoformat('2022-01-01T00:00:00+00:00')
baseline_end = datetime.datetime.fromisoformat('2022-01-02T00:00:00+00:00')
last_day = datetime.datetime.fromisoformat('2022-02-01T00:00:00+00:00')

assay_name = "example assay"
assay_builder = client.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end)

Schedule Assay

By default assays are scheduled to run every 24 hours starting immediately after the baseline period ends. This scheduled period is referred to as the assay window and has the following properties:

  • width: The period of data included in the analysis. By default this is 24 hours.
  • interval:
    • How often the analysis is run (every 5 minutes, every 24 hours, etc). By default this is the window width.
  • start: When the analysis should start. By default this is at the end of the baseline period.

These are adjusted through the assay window_builder method that includes the following methods:

  • add_width: Sets the width of the window.
  • add_interval: Sets how often the analysis is run.

In this example, the assay will be set to run an analysis every 12 hours on the previous 24 hours of data:

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end)
assay_builder = assay_builder.add_run_until(last_day)

assay_builder.window_builder().add_width(hours=24).add_interval(hours=12)

assay_config = assay_builder.build()

assay_results = assay_config.interactive_run()
print(f"Generated {len(assay_results)} analyses")
Generated 59 analyses

Perform Interactive Baseline

Interactive baselines can be run against an assay to generate a list of the values that are established in the baseline. This is done through the AssayBuilder.interactive_baseline_run() method, which returns the following:

Parameter Type Description
count Integer The number of records evaluated.
min Float The minimum value found
max Float The maximum value found
mean Float The mean value derived from the values evaluated.
median Float The median value derived from the values evaluated.
std Float The standard deviation from the values evaluated.
start DateTime The start date for the records to evaluate.
end DateTime The end date for the records to evaluate.

In this example, an interactive baseline will be run against a new assay, and the results displayed:

baseline_run = assay_builder.build().interactive_baseline_run()
baseline_run.baseline_stats()

                    Baseline
count                   1813
min                    11.95
max                    15.08
mean                   12.95
median                 12.91
std                     0.46
start   2022-01-01T00:00:00Z
end     2022-01-02T00:00:00Z

Display Assay Graphs

Histogram, kernel density estimate (KDE), and Empirical Cumulative Distribution (ecdf) charts can be generated from an assay to provide a visual representation of the values evaluated and where they fit within the established baseline.

These methods are part of the AssayBuilder object and are as follows:

Method Description
baseline_histogram() Creates a histogram chart from the assay baseline.
baseline_kde() Creates a kernel density estimate (KDE) chart from the assay baseline.
baseline_ecdf() Creates an Empirical Cumulative Distribution (ecdf) from the assay baseline.

In this example, each of the three different charts will be generated from an assay:

assay_builder.baseline_histogram()
assay_builder.baseline_kde()
assay_builder.baseline_ecdf()

Run Interactive Assay

Users can issue an assay to be run through an interactive assay instead of waiting for the next scheduled assay to run through the wallaroo.assay_config.interactive_run method. This is usually run through the wallaroo.client.build_assay method, which returns a wallaroo.assay_config.AssayBuilder object.

The following example creates the AssayBuilder object then runs an interactive assay.

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end)
assay_config = assay_builder.add_run_until(last_day).build()
assay_results = assay_config.interactive_run()
assay_df = assay_results.to_dataframe()
assay_df
assay_id name iopath score start min max mean median std warning_threshold alert_threshold status
0 None output dense_2 0 0.00 2023-01-02T00:00:00+00:00 12.05 14.71 12.97 12.90 0.48 None 0.25 Ok
1 None output dense_2 0 0.09 2023-01-03T00:00:00+00:00 12.04 14.65 12.96 12.93 0.41 None 0.25 Ok
2 None output dense_2 0 0.04 2023-01-04T00:00:00+00:00 11.87 14.02 12.98 12.95 0.46 None 0.25 Ok
3 None output dense_2 0 0.06 2023-01-05T00:00:00+00:00 11.92 14.46 12.93 12.87 0.46 None 0.25 Ok
4 None output dense_2 0 0.02 2023-01-06T00:00:00+00:00 12.02 14.15 12.95 12.90 0.43 None 0.25 Ok
5 None output dense_2 0 0.03 2023-01-07T00:00:00+00:00 12.18 14.58 12.96 12.93 0.44 None 0.25 Ok
6 None output dense_2 0 0.02 2023-01-08T00:00:00+00:00 12.01 14.60 12.92 12.90 0.46 None 0.25 Ok
7 None output dense_2 0 0.04 2023-01-09T00:00:00+00:00 12.01 14.40 13.00 12.97 0.45 None 0.25 Ok
8 None output dense_2 0 0.06 2023-01-10T00:00:00+00:00 11.99 14.79 12.94 12.91 0.46 None 0.25 Ok
9 None output dense_2 0 0.02 2023-01-11T00:00:00+00:00 11.90 14.66 12.91 12.88 0.45 None 0.25 Ok
10 None output dense_2 0 0.02 2023-01-12T00:00:00+00:00 11.96 14.82 12.94 12.90 0.46 None 0.25 Ok
11 None output dense_2 0 0.03 2023-01-13T00:00:00+00:00 12.07 14.61 12.96 12.93 0.47 None 0.25 Ok
12 None output dense_2 0 0.15 2023-01-14T00:00:00+00:00 12.00 14.20 13.06 13.03 0.43 None 0.25 Ok
13 None output dense_2 0 2.92 2023-01-15T00:00:00+00:00 12.74 15.62 14.00 14.01 0.57 None 0.25 Alert
14 None output dense_2 0 7.89 2023-01-16T00:00:00+00:00 14.64 17.19 15.91 15.87 0.63 None 0.25 Alert
15 None output dense_2 0 8.87 2023-01-17T00:00:00+00:00 16.60 19.23 17.94 17.94 0.63 None 0.25 Alert
16 None output dense_2 0 8.87 2023-01-18T00:00:00+00:00 18.67 21.29 20.01 20.04 0.64 None 0.25 Alert
17 None output dense_2 0 8.87 2023-01-19T00:00:00+00:00 20.72 23.57 22.17 22.18 0.65 None 0.25 Alert
18 None output dense_2 0 8.87 2023-01-20T00:00:00+00:00 23.04 25.72 24.32 24.33 0.66 None 0.25 Alert
19 None output dense_2 0 8.87 2023-01-21T00:00:00+00:00 25.06 27.67 26.48 26.49 0.63 None 0.25 Alert
20 None output dense_2 0 8.87 2023-01-22T00:00:00+00:00 27.21 29.89 28.63 28.58 0.65 None 0.25 Alert
21 None output dense_2 0 8.87 2023-01-23T00:00:00+00:00 29.36 32.18 30.82 30.80 0.67 None 0.25 Alert
22 None output dense_2 0 8.87 2023-01-24T00:00:00+00:00 31.56 34.35 32.98 32.98 0.65 None 0.25 Alert
23 None output dense_2 0 8.87 2023-01-25T00:00:00+00:00 33.68 36.44 35.14 35.14 0.66 None 0.25 Alert
24 None output dense_2 0 8.87 2023-01-26T00:00:00+00:00 35.93 38.51 37.31 37.33 0.65 None 0.25 Alert
25 None output dense_2 0 3.69 2023-01-27T00:00:00+00:00 12.06 39.91 29.29 38.65 12.66 None 0.25 Alert
26 None output dense_2 0 0.05 2023-01-28T00:00:00+00:00 11.87 13.88 12.92 12.90 0.38 None 0.25 Ok
27 None output dense_2 0 0.10 2023-01-29T00:00:00+00:00 12.02 14.36 12.98 12.96 0.38 None 0.25 Ok
28 None output dense_2 0 0.11 2023-01-30T00:00:00+00:00 11.99 14.44 12.89 12.88 0.37 None 0.25 Ok
29 None output dense_2 0 0.01 2023-01-31T00:00:00+00:00 12.00 14.64 12.92 12.89 0.40 None 0.25 Ok

Bins

As defined under Assay Details, bins can be adjusted by number of bins, bin mode, and bin weight.

Number of Bins

The number of bins can be changed from the default of 5 through the wallaroo.assay_config.summarizer_builder.add_num_buns method. Note that the total number of bins will include the set bins, plus the left_outlier and the right_outlier bins. So the total number of bins are the set number of bins + 2.

The following example shows how to change the number of bins to 10 in an assay, then the assay results displayed in a chart with the total bins of 12 total (10 manually set, 1 left_outlier, 1 right_outlier).

assay_builder = wl.build_assay("Test Assay", pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_bin_mode(BinMode.QUANTILE).add_num_bins(10)
assay_results = assay_builder.build().interactive_run()
display(display(assay_results[1].compare_bins()))
assay_results[1].chart()
b_edges b_edge_names b_aggregated_values b_aggregation w_edges w_edge_names w_aggregated_values w_aggregation diff_in_pcts
0 11.95 left_outlier 0.00 Density 11.95 left_outlier 0.00 Density 0.00
1 12.40 q_10 0.10 Density 12.40 e_1.24e1 0.10 Density 0.00
2 12.56 q_20 0.10 Density 12.56 e_1.26e1 0.09 Density -0.01
3 12.70 q_30 0.10 Density 12.70 e_1.27e1 0.09 Density -0.01
4 12.81 q_40 0.10 Density 12.81 e_1.28e1 0.10 Density 0.00
5 12.91 q_50 0.10 Density 12.91 e_1.29e1 0.12 Density 0.02
6 13.01 q_60 0.10 Density 13.01 e_1.30e1 0.08 Density -0.02
7 13.15 q_70 0.10 Density 13.15 e_1.31e1 0.12 Density 0.02
8 13.31 q_80 0.10 Density 13.31 e_1.33e1 0.09 Density -0.01
9 13.56 q_90 0.10 Density 13.56 e_1.36e1 0.11 Density 0.01
10 15.08 q_100 0.10 Density 15.08 e_1.51e1 0.09 Density -0.01
11 NaN right_outlier 0.00 Density NaN right_outlier 0.00 Density 0.00

Bin Mode

Assays support the following binning modes:

  • BinMode.QUANTILE (Default): Defines the bins using percentile ranges (each bin holds the same percentage of the baseline data).
  • BinMode.EQUAL defines the bins using equally spaced data value ranges, like a histogram.
  • Custom aka BinMode.PROVIDED allows users to set the range of values for each bin, with the Left Outlier always starting at Min (below the minimum values detected from the baseline) and the Right Outlier always ending at Max (above the maximum values detected from the baseline). When using BinMode.PROVIDED the edges are passed as an array value.

Bin modes are set through the wallaroo.assay_config.summarizer_builder.add_bin_mode method.

The following examples will demonstrate changing the bin mode to equal, then setting custom provided values.

prefix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))

assay_name = f"{prefix}example assay"

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_bin_mode(BinMode.EQUAL)
assay_results = assay_builder.build().interactive_run()
display(display(assay_results[0].compare_bins()))
assay_results[0].chart()
b_edges b_edge_names b_aggregated_values b_aggregation w_edges w_edge_names w_aggregated_values w_aggregation diff_in_pcts
0 12.00 left_outlier 0.00 Density 12.00 left_outlier 0.00 Density 0.00
1 12.60 p_1.26e1 0.24 Density 12.60 e_1.26e1 0.24 Density 0.00
2 13.19 p_1.32e1 0.49 Density 13.19 e_1.32e1 0.48 Density -0.02
3 13.78 p_1.38e1 0.22 Density 13.78 e_1.38e1 0.22 Density -0.00
4 14.38 p_1.44e1 0.04 Density 14.38 e_1.44e1 0.06 Density 0.02
5 14.97 p_1.50e1 0.01 Density 14.97 e_1.50e1 0.01 Density 0.00
6 NaN right_outlier 0.00 Density NaN right_outlier 0.00 Density 0.00
None

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Equal
aggregation = Density
metric = PSI
weighted = False
score = 0.011074287819376092
scores = [0.0, 7.3591419975306595e-06, 0.000773779195360713, 8.538514991838585e-05, 0.010207597078872246, 1.6725322721660374e-07, 0.0]
index = None
png
edges = [11.0, 12.0, 13.0, 14.0, 15.0, 16.0]
assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_bin_mode(BinMode.PROVIDED, edges)
assay_results = assay_builder.build().interactive_run()
display(display(assay_results[0].compare_bins()))
assay_results[0].chart()
b_edges b_edge_names b_aggregated_values b_aggregation w_edges w_edge_names w_aggregated_values w_aggregation diff_in_pcts
0 11.00 left_outlier 0.00 Density 11.00 left_outlier 0.00 Density 0.00
1 12.00 e_1.20e1 0.00 Density 12.00 e_1.20e1 0.00 Density 0.00
2 13.00 e_1.30e1 0.62 Density 13.00 e_1.30e1 0.59 Density -0.03
3 14.00 e_1.40e1 0.36 Density 14.00 e_1.40e1 0.35 Density -0.00
4 15.00 e_1.50e1 0.02 Density 15.00 e_1.50e1 0.06 Density 0.03
5 16.00 e_1.60e1 0.00 Density 16.00 e_1.60e1 0.00 Density 0.00
6 NaN right_outlier 0.00 Density NaN right_outlier 0.00 Density 0.00
None

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Provided
aggregation = Density
metric = PSI
weighted = False
score = 0.0321620386600679
scores = [0.0, 0.0, 0.0014576920813015586, 3.549754401142936e-05, 0.030668849034754912, 0.0, 0.0]
index = None
png

Bin Weights

Bin weights can be adjusted so bins that that bins with more importance can be given more prominence in the final assay score. This is done through the wallaroo.assay_config.summarizer_builder.add_bin_weights, where the weights are assigned as array values matching the bins.

The following example has 10 bins (12 total including the left_outlier and the right_outlier bins), with weights assigned of 0 for the first six bins, 1 for the last six, and the resulting score from these weights.

weights = [0] * 6
weights.extend([1] * 6)
print("Using weights: ", weights)
assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_bin_mode(BinMode.QUANTILE).add_num_bins(10).add_bin_weights(weights)
assay_results = assay_builder.build().interactive_run()
display(display(assay_results[1].compare_bins()))
assay_results[1].chart()
Using weights:  [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
b_edges b_edge_names b_aggregated_values b_aggregation w_edges w_edge_names w_aggregated_values w_aggregation diff_in_pcts
0 12.00 left_outlier 0.00 Density 12.00 left_outlier 0.00 Density 0.00
1 12.41 q_10 0.10 Density 12.41 e_1.24e1 0.09 Density -0.00
2 12.55 q_20 0.10 Density 12.55 e_1.26e1 0.04 Density -0.05
3 12.72 q_30 0.10 Density 12.72 e_1.27e1 0.14 Density 0.03
4 12.81 q_40 0.10 Density 12.81 e_1.28e1 0.05 Density -0.05
5 12.88 q_50 0.10 Density 12.88 e_1.29e1 0.12 Density 0.02
6 12.98 q_60 0.10 Density 12.98 e_1.30e1 0.09 Density -0.01
7 13.15 q_70 0.10 Density 13.15 e_1.32e1 0.18 Density 0.08
8 13.33 q_80 0.10 Density 13.33 e_1.33e1 0.14 Density 0.03
9 13.47 q_90 0.10 Density 13.47 e_1.35e1 0.07 Density -0.03
10 14.97 q_100 0.10 Density 14.97 e_1.50e1 0.08 Density -0.02
11 NaN right_outlier 0.00 Density NaN right_outlier 0.00 Density 0.00
None

baseline mean = 12.940910643273655
window mean = 12.956829186961135
baseline median = 12.884286880493164
window median = 12.929338455200195
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = True
score = 0.012600694309416988
scores = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00019654033061397393, 0.00850384373737565, 0.0015735766052488358, 0.0014437605903522511, 0.000882973045826275, 0.0]
index = None

/opt/homebrew/anaconda3/envs/arrowtests/lib/python3.8/site-packages/wallaroo/assay.py:315: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels(labels=edge_names, rotation=45)
png

Metrics

The metric score is a distance or dis-similarity measure. The larger it is the less similar the two distributions are. The following metrics are supported:

  • PSI: Population Stability Index
  • SumDiff: The sum of differences
  • MaxDiff: The maximum of differences.

The following coded sample shows each used.

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()
baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 0.0029273068646199748
scores = [0.0, 0.000514261205558409, 0.0002139202456922972, 0.0012617897456473992, 0.0002139202456922972, 0.0007234154220295724, 0.0]
index = None
png
assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_metric(Metric.SUMDIFF)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()
baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = SumDiff
weighted = False
score = 0.025438649748041997
scores = [0.0, 0.009956893934794486, 0.006648048084512165, 0.01548175581324751, 0.006648048084512165, 0.012142553579017668, 0.0]
index = None
png
assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_metric(Metric.MAXDIFF)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()
baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = MaxDiff
weighted = False
score = 0.01548175581324751
scores = [0.0, 0.009956893934794486, 0.006648048084512165, 0.01548175581324751, 0.006648048084512165, 0.012142553579017668, 0.0]
index = 3
png

Aggregation Options

Bin aggregation can be done in histogram Aggregation.DENSITY style (the default) where we count the number/percentage of values that fall in each bin or Empirical Cumulative Density Function style Aggregation.CUMULATIVE where we keep a cumulative count of the values/percentages that fall in each bin.

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_aggregation(Aggregation.DENSITY)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()
baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 0.0029273068646199748
scores = [0.0, 0.000514261205558409, 0.0002139202456922972, 0.0012617897456473992, 0.0002139202456922972, 0.0007234154220295724, 0.0]
index = None
png
assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_aggregation(Aggregation.CUMULATIVE)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()
baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Cumulative
metric = PSI
weighted = False
score = 0.04419889502762442
scores = [0.0, 0.009956893934794486, 0.0033088458502823492, 0.01879060166352986, 0.012142553579017725, 0.0, 0.0]
index = None
png

2.3 - Wallaroo SDK Reference Guide

Wallaroo SDK Reference Guide

2.3.1 - wallaroo.assay

class Assay(wallaroo.object.Object):

An Assay represents a record in the database. An assay contains some high level attributes such as name, status, active, etc. as well as the sub objects Baseline, Window and Summarizer which specify how the Baseline is derived, how the Windows should be created and how the analysis should be conducted.

Assay(client: Optional[wallaroo.client.Client], data: Dict[str, Any])

Base constructor.

Each object requires:

  • a GraphQL client - in order to fill its missing members dynamically
  • an initial data blob - typically from unserialized JSON, contains at
  • least the data for required members (typically the object's primary key) and optionally other data members.
def turn_on(self):

Sets the Assay to active causing it to run and backfill any missing analysis.

def turn_off(self):

Disables the Assay. No further analysis will be conducted until the assay is enabled.

def set_alert_threshold(self, threshold: float):

Sets the alert threshold at the specified level. The status in the AssayAnalysis will show if this level is exceeded however currently alerting/notifications are not implemented.

def set_warning_threshold(self, threshold: float):

Sets the warning threshold at the specified level. The status in the AssayAnalysis will show if this level is exceeded however currently alerting/notifications are not implemented.

def meta_df(assay_result: Dict, index_name) -> pandas.core.frame.DataFrame:

Creates a dataframe for the meta data in the baseline or window excluding the edge information.

Parameters
  • assay_result: The dict of the raw asset result
def edge_df(window_or_baseline: Dict) -> pandas.core.frame.DataFrame:

Creates a dataframe specifically for the edge information in the baseline or window.

Parameters
  • window_or_baseline: The dict from the assay result of either the window or baseline
class AssayAnalysis:

The AssayAnalysis class helps handle the assay analysis logs from the Plateau logs. These logs are a json document with meta information on the assay and analysis as well as summary information on the baseline and window and information on the comparison between them.

AssayAnalysis(raw: Dict[str, Any])
def compare_basic_stats(self) -> pandas.core.frame.DataFrame:

Creates a simple dataframe making it easy to compare a baseline and window.

def baseline_stats(self) -> pandas.core.frame.DataFrame:

Creates a simple dataframe with the basic stats data for a baseline.

def compare_bins(self) -> pandas.core.frame.DataFrame:

Creates a simple dataframe to compare the bin/edge information of baseline and window.

def baseline_bins(self) -> pandas.core.frame.DataFrame:

Creates a simple dataframe to with the edge/bin data for a baseline.

def chart(self, show_scores=True):

Quickly create a chart showing the bins, values and scores of an assay analysis. show_scores will also label each bin with its final weighted (if specified) score.

class AssayAnalysisList:

Helper class primarily to easily create a dataframe from a list of AssayAnalysis objects.

AssayAnalysisList(raw: List[wallaroo.assay.AssayAnalysis])
def to_dataframe(self) -> pandas.core.frame.DataFrame:

Creates and returns a summary dataframe from the assay results.

def to_full_dataframe(self) -> pandas.core.frame.DataFrame:

Creates and returns a dataframe with all values including inputs and outputs from the assay results.

def chart_df( self, df: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], title: str, nth_x_tick=None):

Creates a basic chart of the scores from dataframe created from assay analysis list

def chart_scores(self, title: Optional[str] = None, nth_x_tick=4):

Creates a basic chart of the scores from an AssayAnalysisList

def chart_iopaths( self, labels: Optional[List[str]] = None, selected_labels: Optional[List[str]] = None, nth_x_tick=None):

Creates a basic charts of the scores for each unique iopath of an AssayAnalysisList

class Assays(typing.List[wallaroo.assay.Assay]):

Wraps a list of assays for display in an HTML display-aware environment like Jupyter.

Inherited Members
builtins.list
list
clear
copy
append
insert
extend
pop
remove
index
count
reverse
sort