Wallaroo SDK Essentials Guide

Reference Guide for the most essential Wallaroo SDK Commands

The following commands are the most essential when working with Wallaroo. They are listed in the order that a typical workflow would follow as the following:

Connect to Wallaroo

The first step in using Wallaroo is creating a connection. To connect to your Wallaroo environment:

  1. Import the wallaroo library:

    import wallaroo
    
  2. Open a connection to the Wallaroo environment with the wallaroo.Client() command and save it to a variable.

    In this example, the Wallaroo connection is saved to the variable wl.

    wl = wallaroo.Client(auth_type="user_password")
    
  3. A verification URL will be displayed. Enter it into your browser and grant access to the SDK client.

    Wallaroo Confirm Connection
  4. Once this is complete, you will be able to continue with your Wallaroo commands.

    Wallaroo Connection Example

Workspace Management

Workspaces are used to segment groups of models into separate environments. This allows different users to either manage or have access to each workspace, controlling the models and pipelines assigned to the workspace.

Create a Workspace

Workspaces can be created either through the Wallaroo Dashboard or through the Wallaroo SDK.

To create a workspace, use the create_workspace("{WORKSPACE NAME}") command through an established Wallaroo connection and store the workspace settings into a new variable. Once the new workspace is created, the user who created the workspace is assigned as its owner. The following template is an example:

{New Workspace Variable} = {Wallaroo Connection}.create_workspace("{New Workspace Name}")

For example, if the connection is stored in the variable wl and the new workspace will be named imdb, then the command to store it in the new_workspace variable would be:

new_workspace = wl.create_workspace("imdb-workspace")

List Workspaces

The command list_workspaces() displays the workspaces that are part of the current Wallaroo connection. The following details are returned as an array:

Paramater Type Description
Name String The name of the workspace. Note that workspace names are not unique.
Created At DateTime The date and time the workspace was created.
Users Array[Users] A list of all users assigned to this workspace.
Models Integer The number of models uploaded to the workspace.
Pipelines Integer The number of pipelines in the environment.

For example, for the Wallaroo connection wl the following workspaces are returned:

wl.list_workspaces()
Name	Created At	Users	Models	Pipelines
aloha-workspace	2022-03-29 20:15:38	['steve@ex.co']	1	1
ccfraud-workspace	2022-03-29 20:20:55	['steve@ex.co']	1	1
demandcurve-workspace	2022-03-29 20:21:32	['steve@ex.co']	3	1
imdb-workspace	2022-03-29 20:23:08	['steve@ex.co']	2	1
aloha-workspace	2022-03-29 20:33:54	['steve@ex.co']	1	1
imdb-workspace	2022-03-30 17:09:23	['steve@ex.co']	2	1
imdb-workspace	2022-03-30 17:43:09	['steve@ex.co']	0	0

Get Current Workspace

The command get_current_workspace displays the current workspace used for the Wallaroo connection. The following information is returned by default:

Paramater Type Description
name String The name of the current workspace.
id Integer The ID of the current workspace.
archived Bool Whether the workspace is archived or not.
created_by String The identifier code for the user that created the workspace.
created_at DateTime When the timestamp for when workspace was created.
models Array[Models] The models that are uploaded to this workspace.
pipelines Array[Pipelines] The pipelines created for the workspace.

For example, the following will display the current workspace for the wl connection that contains a single pipeline and multiple models:

wl.get_current_workspace()
{'name': 'imdb-workspace', 'id': 6, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-30T17: 09: 23.960406+00: 00', 'models': [
        {'name': 'embedder-o', 'version': '6dbe5524-7bc3-4ff3-8ca8-d454b2cbd0e4', 'file_name': 'embedder.onnx', 'last_update_time': datetime.datetime(2022,
            3,
            30,
            17,
            34,
            18,
            321105, tzinfo=tzutc())
        },
        {'name': 'smodel-o', 'version': '6eb7f824-3d77-417f-9169-6a301d20d842', 'file_name': 'sentiment_model.onnx', 'last_update_time': datetime.datetime(2022,
            3,
            30,
            17,
            34,
            18,
            783485, tzinfo=tzutc())
        }
    ], 'pipelines': [
        {'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022,
            3,
            30,
            17,
            34,
            19,
            318819, tzinfo=tzutc()), 'definition': '[]'
        }
    ]
}

Set the Current Workspace

The current workspace can be set through set_current_workspace for the Wallaroo connection through the following call, and returns the workspace details as a JSON object:

{Wallaroo Connection}.set_current_workspace({Workspace Object})

Set Current Workspace from a New Workspace

The following example creates the workspace imdb-workspace through the Wallaroo connection stored in the variable wl, then sets it as the current workspace:

new_workspace = wl.create_workspace("imdb-workspace")
wl.set_current_workspace(new_workspace)
{'name': 'imdb-workspace', 'id': 7, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-30T17:43:09.405038+00:00', 'models': [], 'pipelines': []}

Set the Current Workspace an Existing Workspace

To set the current workspace from an established workspace, the easiest method is to use list_workspaces() then set the current workspace as the array value displayed. For example, from the following list_workspaces() command the 3rd workspace element demandcurve-workspace can be assigned as the current workspace:

wl.list_workspaces()

Name	Created At	Users	Models	Pipelines
aloha-workspace	2022-03-29 20:15:38	['steve@ex.co']	1	1
ccfraud-workspace	2022-03-29 20:20:55	['steve@ex.co']	1	1
demandcurve-workspace	2022-03-29 20:21:32	['steve@ex.co']	3	1
imdb-workspace	2022-03-29 20:23:08	['steve@ex.co']	2	1
aloha-workspace	2022-03-29 20:33:54	['steve@ex.co']	1	1
imdb-workspace	2022-03-30 17:09:23	['steve@ex.co']	2	1
imdb-workspace	2022-03-30 17:43:09	['steve@ex.co']	0	0

wl.set_current_workspace(wl.list_workspaces()[2])

{'name': 'demandcurve-workspace', 'id': 3, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-29T20:21:32.732178+00:00', 'models': [{'name': 'demandcurve', 'version': '4f5193fc-9c18-4851-8489-42e61d095588', 'file_name': 'demand_curve_v1.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 21, 32, 822812, tzinfo=tzutc())}, {'name': 'preprocess', 'version': '159b9e99-edb6-4c5e-8336-63bc6000623e', 'file_name': 'preprocess.py', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 21, 32, 984117, tzinfo=tzutc())}, {'name': 'postprocess', 'version': '77ee154c-d64c-49dd-985a-96f4c2931b6e', 'file_name': 'postprocess.py', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 21, 33, 119037, tzinfo=tzutc())}], 'pipelines': [{'name': 'demand-curve-pipeline', 'create_time': datetime.datetime(2022, 3, 29, 20, 21, 33, 264321, tzinfo=tzutc()), 'definition': '[]'}]}

Managing Workspace Users

Users are managed via their email address, and can be assigned to a workspace as either the owner or a user.

List All Users

list_users() returns an array of all users registered in the connected Wallaroo platform with the following values:

Paramater Type Description
id String The unique identifier for the user.
email String The unique email identifier for the user.
username String The unique username of the user.

For example, listing all users in the Wallaroo connection returns the following:

wl.list_users()
[User({"id": "7b8b4f7d-de27-420f-9cd0-892546cb0f82", "email": "test@test.com", "username": "admin"),
 User({"id": "45e6b641-fe57-4fb2-83d2-2c2bd201efe8", "email": "steve@ex.co", "username": "steve")]

Get User by Email

The get_user_by_email({email}) command finds the user who’s email address matches the submitted {email} field. If no email address matches then the return will be null.

For example, the user steve with the email address steve@ex.co returns the following:

wl.get_user_by_email("steve@ex.co")

User({"id": "45e6b641-fe57-4fb2-83d2-2c2bd201efe8", "email": "steve@ex.co", "username": "steve")

Add a User to a Workspace

Users are added to the workspace via their email address through the wallaroo.workspace.Workspace.add_user({email address}) command. The email address must be assigned to a current user in the Wallaroo platform before they can be assigned to the workspace.

For example, the following workspace imdb-workspace has the user steve@ex.co. We will add the user john@ex.co to this workspace:

wl.list_workspaces()

Name	Created At	Users	Models	Pipelines
aloha-workspace	2022-03-29 20:15:38	['steve@ex.co']	1	1
ccfraud-workspace	2022-03-29 20:20:55	['steve@ex.co']	1	1
demandcurve-workspace	2022-03-29 20:21:32	['steve@ex.co']	3	1
imdb-workspace	2022-03-29 20:23:08	['steve@ex.co']	2	1
aloha-workspace	2022-03-29 20:33:54	['steve@ex.co']	1	1
imdb-workspace	2022-03-30 17:09:23	['steve@ex.co']	2	1
imdb-workspace	2022-03-30 17:43:09	['steve@ex.co']	0	0

current_workspace = wl.list_workspaces()[3]

current_workspace.add_user("john@ex.co")

{'name': 'imdb-workspace', 'id': 4, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-29T20:23:08.742676+00:00', 'models': [{'name': 'embedder-o', 'version': '23a33c3d-68e6-4bdb-a8bc-32ea846908ee', 'file_name': 'embedder.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 23, 8, 833716, tzinfo=tzutc())}, {'name': 'smodel-o', 'version': '2c298aa9-be9d-482d-8188-e3564bdbab43', 'file_name': 'sentiment_model.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 23, 9, 49881, tzinfo=tzutc())}], 'pipelines': [{'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 29, 20, 23, 28, 518946, tzinfo=tzutc()), 'definition': '[]'}]}

wl.list_workspaces()

Name	Created At	Users	Models	Pipelines
aloha-workspace	2022-03-29 20:15:38	['steve@ex.co']	1	1
ccfraud-workspace	2022-03-29 20:20:55	['steve@ex.co']	1	1
demandcurve-workspace	2022-03-29 20:21:32	['steve@ex.co']	3	1
imdb-workspace	2022-03-29 20:23:08	['steve@ex.co', 'john@ex.co']	2	1
aloha-workspace	2022-03-29 20:33:54	['steve@ex.co']	1	1
imdb-workspace	2022-03-30 17:09:23	['steve@ex.co']	2	1
imdb-workspace	2022-03-30 17:43:09	['steve@ex.co']	0	0

Remove a User to a Workspace

Removing a user from a workspace is performed through the wallaroo.workspace.Workspace.remove_user({email address}) command, where the {email address} matches a user in the workspace.

In the following example, the user john@ex.co is removed from the workspace imdb-workspace.

wl.list_workspaces()

Name	Created At	Users	Models	Pipelines
aloha-workspace	2022-03-29 20:15:38	['steve@ex.co']	1	1
ccfraud-workspace	2022-03-29 20:20:55	['steve@ex.co']	1	1
demandcurve-workspace	2022-03-29 20:21:32	['steve@ex.co']	3	1
imdb-workspace	2022-03-29 20:23:08	['steve@ex.co', 'john@ex.co']	2	1
aloha-workspace	2022-03-29 20:33:54	['steve@ex.co']	1	1
imdb-workspace	2022-03-30 17:09:23	['steve@ex.co']	2	1
imdb-workspace	2022-03-30 17:43:09	['steve@ex.co']	0	0

current_workspace = wl.list_workspaces()[3]

current_workspace.remove_user("john@ex.co")

wl.list_workspaces()

Name	Created At	Users	Models	Pipelines
aloha-workspace	2022-03-29 20:15:38	['steve@ex.co']	1	1
ccfraud-workspace	2022-03-29 20:20:55	['steve@ex.co']	1	1
demandcurve-workspace	2022-03-29 20:21:32	['steve@ex.co']	3	1
imdb-workspace	2022-03-29 20:23:08	['steve@ex.co']	2	1
aloha-workspace	2022-03-29 20:33:54	['steve@ex.co']	1	1
imdb-workspace	2022-03-30 17:09:23	['steve@ex.co']	2	1
imdb-workspace	2022-03-30 17:43:09	['steve@ex.co']	0	0

Add a Workspace Owner

To update the owner of workspace, or promote an existing user of a workspace to the owner of workspace, use the wallaroo.workspace.Workspace.add_owner({email address}) command. The email address must be assigned to a current user in the Wallaroo platform before they can be assigned as the owner to the workspace.

The following example shows assigning the user john@ex.co as an owner to the workspace imdb-workspace:

wl.list_workspaces()

Name	Created At	Users	Models	Pipelines
aloha-workspace	2022-03-29 20:15:38	['steve@ex.co']	1	1
ccfraud-workspace	2022-03-29 20:20:55	['steve@ex.co']	1	1
demandcurve-workspace	2022-03-29 20:21:32	['steve@ex.co']	3	1
imdb-workspace	2022-03-29 20:23:08	['steve@ex.co']	2	1
aloha-workspace	2022-03-29 20:33:54	['steve@ex.co']	1	1
imdb-workspace	2022-03-30 17:09:23	['steve@ex.co']	2	1
imdb-workspace	2022-03-30 17:43:09	['steve@ex.co']	0	0

current_workspace = wl.list_workspaces()[3]

current_workspace.add_owner("john@ex.co")

{'name': 'imdb-workspace', 'id': 4, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-29T20:23:08.742676+00:00', 'models': [{'name': 'embedder-o', 'version': '23a33c3d-68e6-4bdb-a8bc-32ea846908ee', 'file_name': 'embedder.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 23, 8, 833716, tzinfo=tzutc())}, {'name': 'smodel-o', 'version': '2c298aa9-be9d-482d-8188-e3564bdbab43', 'file_name': 'sentiment_model.onnx', 'last_update_time': datetime.datetime(2022, 3, 29, 20, 23, 9, 49881, tzinfo=tzutc())}], 'pipelines': [{'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 29, 20, 23, 28, 518946, tzinfo=tzutc()), 'definition': '[]'}]}

wl.list_workspaces()

Name	Created At	Users	Models	Pipelines
aloha-workspace	2022-03-29 20:15:38	['steve@ex.co']	1	1
ccfraud-workspace	2022-03-29 20:20:55	['steve@ex.co']	1	1
demandcurve-workspace	2022-03-29 20:21:32	['steve@ex.co']	3	1
imdb-workspace	2022-03-29 20:23:08	['steve@ex.co', 'john@ex.co']	2	1
aloha-workspace	2022-03-29 20:33:54	['steve@ex.co']	1	1
imdb-workspace	2022-03-30 17:09:23	['steve@ex.co']	2	1
imdb-workspace	2022-03-30 17:43:09	['steve@ex.co']	0	0

Activate and Deactivate Users

This feature is only available for Wallaroo Community. Wallaroo Community only allows a total of 5 users per Wallaroo Community instance. Deactivated users does not count to this total - this allows organizations to add users, then activate/deactivate them as needed to stay under the total number of licensed users count.

Wallaroo Enterprise has no limits on the number of users who can be added or active in a Wallaroo instance.

To remove a user’s access to the Wallaroo instance, use the Wallaroo Client deactivate_user("{User Email Address}) method, replacing the {User Email Address} with the email address of the user to deactivate.

To activate a user, use the Wallaroo Client active_user("{User Email Address}) method, replacing the {User Email Address} with the email address of the user to activate.

In this example, the user kilvin.mitchell@wallaroo.ai will be deactivated then reactivated.

wl.list_users()

[User({"id": "0528f34c-2725-489f-b97b-da0cde02cbd9", "email": "kilvin.mitchell@wallaroo.ai", "username": "kilvin.mitchell@wallaroo.ai"),
 User({"id": "3927b9d3-c279-442c-a3ac-78ba1d2b14d8", "email": "john.hummel+signuptest@wallaroo.ai", "username": "john.hummel+signuptest@wallaroo.ai")]

wl.deactivate_user("kilvin.mitchell@wallaroo.ai")

wl.activate_user("kilvin.mitchell@wallaroo.ai")

Model Management

Upload Models to a Workspace

Models are uploaded to the current workspace through the Wallaroo Client upload_model("{Model Name}", "{Model Path}).configure(options). In most cases, leaving the options field can be left blank. For more details, see the full SDK guide.

Models can either be uploaded in the Open Neural Network eXchange(ONNX) format, or be auto-converted and uploaded using the Wallaroo convert_model(path, source_type, conversion_arguments) method. For more information, see the tutorial series ONNX Conversion Tutorials.

The following example shows how to upload two models to the imdb-workspace workspace:

wl.get_current_workspace()

{'name': 'imdb-workspace', 'id': 8, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-30T21:13:21.87287+00:00', 'models': [], 'pipelines': []}

embedder = wl.upload_model('embedder-o', './embedder.onnx').configure()
smodel = wl.upload_model('smodel-o', './sentiment_model.onnx').configure()

{'name': 'imdb-workspace', 'id': 9, 'archived': False, 'created_by': '45e6b641-fe57-4fb2-83d2-2c2bd201efe8', 'created_at': '2022-03-30T21:14:37.733171+00:00', 'models': [{'name': 'embedder-o', 'version': '28ecb706-473e-4f24-9eae-bfa71b897108', 'file_name': 'embedder.onnx', 'last_update_time': datetime.datetime(2022, 3, 30, 21, 14, 37, 815243, tzinfo=tzutc())}, {'name': 'smodel-o', 'version': '5d2782e1-fb88-430f-b6eb-c0a0eb46beb9', 'file_name': 'sentiment_model.onnx', 'last_update_time': datetime.datetime(2022, 3, 30, 21, 14, 38, 77973, tzinfo=tzutc())}], 'pipelines': []}

Auto-Convert Models

Machine Learning (ML) models can be converted and uploaded into Wallaroo workspace using the Wallaroo Client convert_model(path, source_type, conversion_arguments) method. This conversion process transforms the model into an open format that can be run across different framework at compiled C-language speeds.

The three input parameters are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
    • keras: ModelConversionSource.KERAS
  • conversion_arguments: The arguments for the conversion based on the type of model being converted. These are:
    • wallaroo.ModelConversion.ConvertKerasArguments: Used for converting keras type models and takes the following parameters:
      • name: The name of the model being converted.
      • comment: Any comments for the model.
      • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}. See ModelConversionTypes for more details.
      • dimensions: Corresponds to the keras xtrain in the format [{Number of Rows/None}, {Number of Columns 1}, {Number of Columns 2}...]. For a standard 1-dimensional array with 100 columns this would typically be [None, 100].
    • wallaroo.ModelConversion.ConvertSKLearnArguments: Used for sklearn models and takes the following parameters:
      • name: The name of the model being converted.
      • comment: Any comments for the model.
      • number_of_columns: The number of columns the model was trained for.
      • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}. See ModelConversionTypes for more details.
    • wallaroo.ModelConversion.ConvertXGBoostArgs: Used for XGBoost models and takes the following parameters:
      • name: The name of the model being converted.
      • comment: Any comments for the model.
      • number_of_columns: The number of columns the model was trained for.
      • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}. See ModelConversionTypes for more details.

Once uploaded, they will be displayed in the Wallaroo Models Dashboard as {unique-file-id}-converted.onnx:

Converted Model

ModelConversionInputTypes

The following data types are supported with the ModelConversionInputType parameter:

Parameter Data Type
Float16 float16
Float32 float32
Float64 float64
Int16 int16
Int32 int32
Int64 int64
UInt8 uint8
UInt16 uint16
UInt32 uint32
UInt64 uint64
Boolean bool
Double double

sk-learn Example

The following example converts and uploads a Linear Regression sklearn model lm.pickle and stores it in the variable converted_model:

wl = wallaroo.Client()


workspace_name = "testconversion"
_ = wl.set_current_workspace(get_or_create_workspace(workspace_name))

model_conversion_args = ConvertSKLearnArguments(
    name="lm-test",
    comment="test linear regression",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Double
)


model_conversion_type = ModelConversionSource.SKLEARN

# convert the model and store it in the variable `converted_model`:

converted_model = wl.convert_model('lm.pickle', model_conversion_type, model_conversion_args)

keras Example

The following example shows converting a keras model with 100 columns and uploading it to a Wallaroo instance:

model_columns = 100

model_conversion_args = ConvertKerasArguments(
    name=model_name,
    comment="simple keras model",
    input_type=ModelConversionInputType.Float32,
    dimensions=(None, model_columns)
)
model_conversion_type = ModelConversionSource.KERAS

model_wl = wl.convert_model('simple_sentiment_model.zip', model_conversion_type, model_conversion_args)
model_wl
{'name': 'simple-sentiment-model', 'version': 'c76870f8-e16b-4534-bb17-e18a3e3806d5', 'file_name': '14d9ab8d-47f4-4557-82a7-6b26cb67ab05-converted.onnx', 'last_update_time': datetime.datetime(2022, 7, 7, 16, 41, 22, 528430, tzinfo=tzutc())}

Pipeline Management

Pipelines are the method of taking submitting data and processing that data through the models. Each pipeline can have one or more steps that submit the data from the previous step to the next one. Information can be submitted to a pipeline as a file, or through the pipeline’s URL.

A pipeline’s metrics can be viewed through the Wallaroo Dashboard Pipeline Details and Metrics page.

Create a Pipeline

New pipelines are created in the current workspace.

To create a new pipeline, use the Wallaroo Client build_pipeline("{Pipeline Name}") command.

The following example creates a new pipeline imdb-pipeline through a Wallaroo Client connection wl:

imdb_pipeline = wl.build_pipeline("imdb-pipeline")

imdb_pipeline.status()
{'status': 'Pipeline imdb-pipeline is not deployed'}

List All Pipelines

The Wallaroo Client method list_pipelines() lists all pipelines in a Wallaroo Instance.

The following example lists all pipelines in the wl Wallaroo Client connection:

wl.list_pipelines()

[{'name': 'ccfraud-pipeline', 'create_time': datetime.datetime(2022, 4, 12, 17, 55, 41, 944976, tzinfo=tzutc()), 'definition': '[]'}]

Select an Existing Pipeline

Rather than creating a new pipeline each time, an existing pipeline can be selected by using the list_pipelines() command and assigning one of the array members to a variable.

The following example sets the pipeline ccfraud-pipeline to the variable current_pipeline:

wl.list_pipelines()

[{'name': 'ccfraud-pipeline', 'create_time': datetime.datetime(2022, 4, 12, 17, 55, 41, 944976, tzinfo=tzutc()), 'definition': '[]'}]

current_pipeline = wl.list_pipelines()[0]

current_pipeline.status()

{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.244.5.4',
   'name': 'engine-7fcc7df596-hvlxb',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'ccfraud-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'ccfraud-model',
      'version': '4624e8a8-1414-4408-8b40-e03da4b5cb68',
      'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.1.24',
   'name': 'engine-lb-85846c64f8-mtq9p',
   'status': 'Running',
   'reason': None}]}

Add a Step to a Pipeline

Once a pipeline has been created, or during its creation process, a pipeline step can be added. The pipeline step refers to the model that will perform an inference off of the data submitted to it. Each time a step is added, it is added to the pipeline’s models array.

A pipeline step is added through the pipeline add_model_step({Model}) command.

In the following example, two models uploaded to the workspace are added as pipeline step:

imdb_pipeline.add_model_step(embedder)
imdb_pipeline.add_model_step(smodel)

imdb_pipeline.status()

{'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}, {'ModelInference': {'models': [{'name': 'smodel-o', 'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19', 'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650'}]}}]"}

Pre and Post Processing Steps

A Pipeline Step can be more than models - they can also be pre processing and post processing steps. For example, the Demand Curve Tutorial has both a pre and post processing steps that are added to the pipeline. The preprocessing step uses the following code:

import numpy
import pandas

import json

# add interaction terms for the model
def actual_preprocess(pdata):
    pd = pdata.copy()
    # convert boolean cust_known to 0/1
    pd.cust_known = numpy.where(pd.cust_known, 1, 0)
    # interact UnitPrice and cust_known
    pd['UnitPriceXcust_known'] = pd.UnitPrice * pd.cust_known
    return pd.loc[:, ['UnitPrice', 'cust_known', 'UnitPriceXcust_known']]


# If the data is a json string, call this wrapper instead
# Expected input:
# a dictionary with fields 'colnames', 'data'

# test that the code works here
def wallaroo_json(data):
    obj = json.loads(data)
    pdata = pandas.DataFrame(obj['query'],
                             columns=obj['colnames'])
    pprocessed = actual_preprocess(pdata)
    
   # return a dictionary, with the fields the model expect
    return {
       'tensor_fields': ['model_input'],
       'model_input': pprocessed.to_numpy().tolist()
    }

It is added as a Python module by uploading it as a model:

# load the preprocess module
module_pre = wl.upload_model("preprocess", "./preprocess.py").configure('python')

And then added to the pipeline as a step:

# now make a pipeline
demandcurve_pipeline = (wl.build_pipeline("demand-curve-pipeline")
                        .add_model_step(module_pre)
                        .add_model_step(demand_curve_model)
                        .add_model_step(module_post))

Remove a Pipeline Step

To remove a step from the pipeline, use the Pipeline remove_step(index) command, where the index is the array index for the pipeline’s steps.

In the following example the pipeline imdb_pipeline will have the step with the model smodel-o removed.

imdb_pipeline.status

<bound method Pipeline.status of {'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}, {'ModelInference': {'models': [{'name': 'smodel-o', 'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19', 'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650'}]}}]"}>

imdb_pipeline.remove_step(1)
{'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}]"}

Manage Pipeline Deployment Configuration

Pipelines can be deployed to allocate more or fewer resources through the DeploymentConfig object that sets the pipeline’s autoscaling parameters.

Autoscaling allows the user to define how many engines a pipeline starts with, the minimum amount of engines a pipeline can be using, and the maximum amount of engines a pipeline can scale up to. The pipeline scales up and down based on the average CPU utilization across the engines in a given pipeline as the user’s workload increases and decreases.

This is performed through the wallaroo.DeploymentConfigBuilder method that returns a wallaroo.deployment_config.DeploymentConfig object. The DeploymentConfig is then applied to a Wallaroo pipeline when it is deployed.

The following parameters are used for auto-scaling:

Parameter Default Value Purpose
replica_count 1 Sets the initial amount of engines for the pipeline.
replica_autoscale_min_max 1 Sets the minimum number of engines and the maximum amount of engines. The maximum parameter must be set by the user.
autoscale_cpu_utilization 50 An integer representing the average CPU utilization. The default value is 50, which represents an average of 50% CPU utilization for the engines in a pipeline.

The DeploymentConfig is then built with the a build method.

The following example a DeploymentConfig will be created and saved to the variable ccfraudDeployConfig. It will set the minimum engines to 2, the maximum to 5, and use 60% of CPU utilization. This will then be applied to the deployment of the pipeline ccfraudPipeline by specifying it’s deployment_config parameter.

ccfraudDeployConfig = (wallaroo.DeploymentConfigBuilder()
    .replica_count(1)
    .replica_autoscale_min_max(minimum=2, maximum=5)
    .autoscale_cpu_utilization(60)
    .build())

ccfraudPipeline.deploy(deployment_config=ccfraudDeployConfig)

Deploy a Pipeline

When a pipeline step is added or removed, the pipeline must be deployed through the pipeline deploy(). This allocates resources to the pipeline from the Kubernetes environment and make it available to submit information to perform inferences. This process typically takes 45 seconds.

Once complete, the pipeline status() command will show 'status':'Running'.

Pipeline deployments can be modified to enable auto-scaling to allow pipelines to allocate more or fewer resources based on need by setting the pipeline’s This will then be applied to the deployment of the pipeline ccfraudPipelineby specifying it'sdeployment_config` optional parameter. If this optional parameter is not passed, then the deployment will defer to default values. For more information, see Manage Pipeline Deployment Configuration.

In the following example, the pipeline imdb-pipeline that contains two steps will be deployed with default deployment configuration:

imdb_pipeline.status

<bound method Pipeline.status of {'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}, {'ModelInference': {'models': [{'name': 'smodel-o', 'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19', 'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650'}]}}]"}>

imdb_pipeline.deploy()
Waiting for deployment - this will take up to 45s ...... ok

imdb_pipeline.status()

{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.12.1.65',
   'name': 'engine-778b65459-f9mt5',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'imdb-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'embedder-o',
      'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d',
      'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4',
      'status': 'Running'},
     {'name': 'smodel-o',
      'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19',
      'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.12.1.66',
   'name': 'engine-lb-85846c64f8-ggg2t',
   'status': 'Running',
   'reason': None}]}

Troubleshooting Pipeline Deployment

If you deploy more pipelines than your environment can handle, or if you deploy more pipelines than your license allows, you may see an error like the following:


LimitError: You have reached a license limit in your Wallaroo instance. In order to add additional resources, you can remove some of your existing resources. If you have any questions contact us at community@wallaroo.ai: MAX_PIPELINES_LIMIT_EXCEEDED

Undeploy any unnecessary pipelines either through the SDK or through the Wallaroo Pipeline Dashboard, then attempt to redeploy the pipeline in question again.

Undeploy a Pipeline

When a pipeline is not currently needed, it can be undeployed and its resources turned back to the Kubernetes environment. To undeploy a pipeline, use the pipeline undeploy() command.

In this example, the aloha_pipeline will be undeployed:

aloha_pipeline.undeploy()

{'name': 'aloha-test-demo', 'create_time': datetime.datetime(2022, 3, 29, 20, 34, 3, 960957, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'aloha-2', 'version': 'a8e8abdc-c22f-416c-a13c-5fe162357430', 'sha': 'fd998cd5e4964bbbb4f8d29d245a8ac67df81b62be767afbceb96a03d1a01520'}]}}]"}

Get Pipeline Status

The pipeline status() command shows the current status, models, and other information on a pipeline.

The following example shows the pipeline imdb_pipeline status before and after it is deployed:

imdb_pipeline.status

<bound method Pipeline.status of {'name': 'imdb-pipeline', 'create_time': datetime.datetime(2022, 3, 30, 21, 21, 31, 127756, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'embedder-o', 'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d', 'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4'}]}}, {'ModelInference': {'models': [{'name': 'smodel-o', 'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19', 'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650'}]}}]"}>

imdb_pipeline.deploy()
Waiting for deployment - this will take up to 45s ...... ok

imdb_pipeline.status()

{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.12.1.65',
   'name': 'engine-778b65459-f9mt5',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'imdb-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'embedder-o',
      'version': '1c16d21d-fe4c-4081-98bc-65fefa465f7d',
      'sha': 'd083fd87fa84451904f71ab8b9adfa88580beb92ca77c046800f79780a20b7e4',
      'status': 'Running'},
     {'name': 'smodel-o',
      'version': '8d311ba3-c336-48d3-99cd-85d95baa6f19',
      'sha': '3473ea8700fbf1a1a8bfb112554a0dde8aab36758030dcde94a9357a83fd5650',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.12.1.66',
   'name': 'engine-lb-85846c64f8-ggg2t',
   'status': 'Running',
   'reason': None}]}

Get Pipeline Logs

Pipeline have their own set of log files that can be retrieved and analyzed as needed with the pipeline.logs(limit=100) command. This command takes the following parameters:

Parameter Type Description
limit Int Limits how many log files to display. Defaults to 100.

Typically this only shows the last 3 commands in a Python notebook for spacing purposes.

In this example, the last 50 logs to the pipeline ccfraudpipeline are requested. Only one is shown for brevity.

ccfraud_pipeline.logs(limit=50)
       
Timestamp Output Input Anomalies
2022-23-Aug 16:44:56 [array([[0.00149742]])] [[1.0678324729342086, 0.21778102664937624, -1.7115145261843976, 0.6822857209662413, 1.0138553066742804, -0.43350000129006655, 0.7395859436561657, -0.28828395953577357, -0.44726268795990787, 0.5146124987725894, 0.3791316964287545, 0.5190619748123175, -0.4904593221655364, 1.1656456468728569, -0.9776307444180006, -0.6322198962519854, -0.6891477694494687, 0.17833178574255615, 0.1397992467197424, -0.35542206494183326, 0.4394217876939808, 1.4588397511627804, -0.3886829614721505, 0.4353492889350186, 1.7420053483337177, -0.4434654615252943, -0.15157478906219238, -0.26684517248765616, -1.454961775612449]] 0

Pipeline Shadow Deployments

Wallaroo provides a method of testing the same data against two different models or sets of models at the same time through shadow deployments otherwise known as parallel deployments or A/B test. This allows data to be submitted to a pipeline with inferences running on two different sets of models. Typically this is performed on a model that is known to provide accurate results - the champion - and a model that is being tested to see if it provides more accurate or faster responses depending on the criteria known as the challengers. Multiple challengers can be tested against a single champion to determine which is “better” based on the organization’s criteria.

As described in the Wallaroo blog post The What, Why, and How of Model A/B Testing:

In data science, A/B tests can also be used to choose between two models in production, by measuring which model performs better in the real world. In this formulation, the control is often an existing model that is currently in production, sometimes called the champion. The treatment is a new model being considered to replace the old one. This new model is sometimes called the challenger….

Keep in mind that in machine learning, the terms experiments and trials also often refer to the process of finding a training configuration that works best for the problem at hand (this is sometimes called hyperparameter optimization).

When a shadow deployment is created, only the inference from the champion is returned in the InferenceResult Object data, while the result data for the shadow deployments is stored in the InferenceResult Object shadow_data.

Create Shadow Deployment

Create a parallel or shadow deployment for a pipeline with the pipeline.add_shadow_deploy(champion, challengers[]) method, where the champion is a Wallaroo Model object, and challengers[] is one or more Wallaroo Model objects. The inferences against each of the challengers is run iteratively.

In this example, a shadow deployment is created with the champion versus two challenger models.

champion = wl.upload_model(champion_model_name, champion_model_file).configure()
model2 = wl.upload_model(shadow_model_01_name, shadow_model_01_file).configure()
model3 = wl.upload_model(shadow_model_02_name, shadow_model_02_file).configure()
   
pipeline.add_shadow_deploy(champion, [model2, model3])
pipeline.deploy()
   
name cc-shadow
created 2022-08-04 20:06:55.102203+00:00
last_updated 2022-08-04 20:37:28.785947+00:00
deployed True
tags
steps ccfraud-lstm

Running an inference in a pipeline that has shadow deployments enabled will have the inference run through the champion model returned in the InferenceResult Object’s data element, while the challengers is returned in the InferenceResult Object’s shadow_data element:

pipeline.infer_from_file(sample_data_file)

[InferenceResult({'check_failures': [],
  'elapsed': 125102,
  'model_name': 'ccfraud-lstm',
  'model_version': '6b650c9c-e22f-4c50-97b2-7fce07f18607',
  'original_data': {'tensor': [[1.0678324729342086,
                                0.21778102664937624,
                                -1.7115145261843976,
                                0.6822857209662413,
                                1.0138553066742804,
                                -0.43350000129006655,
                                0.7395859436561657,
                                -0.28828395953577357,
                                -0.44726268795990787,
                                0.5146124987725894,
                                0.3791316964287545,
                                0.5190619748123175,
                                -0.4904593221655364,
                                1.1656456468728569,
                                -0.9776307444180006,
                                -0.6322198962519854,
                                -0.6891477694494687,
                                0.17833178574255615,
                                0.1397992467197424,
                                -0.35542206494183326,
                                0.4394217876939808,
                                1.4588397511627804,
                                -0.3886829614721505,
                                0.4353492889350186,
                                1.7420053483337177,
                                -0.4434654615252943,
                                -0.15157478906219238,
                                -0.26684517248765616,
                                -1.454961775612449]]},
  'outputs': [{'Float': {'data': [0.001497417688369751],
                         'dim': [1, 1],
                         'v': 1}}],
  'pipeline_name': 'cc-shadow',
  'shadow_data': {'ccfraud-rf': [{'Float': {'data': [1.0],
                                            'dim': [1, 1],
                                            'v': 1}}],
                  'ccfraud-xgb': [{'Float': {'data': [0.0005066990852355957],
                                             'dim': [1, 1],
                                             'v': 1}}]},
  'time': 1659645473965})]

Retrieve Shadow Deployment Logs

Inferences run against a pipeline that has shadow deployments (also known as a parallel deployment) will not be visible in the pipeline logs. To view the results against the challenger models of a shadow deployment, use the Pipeline.logs_shadow_deploy() method. The results will be grouped by inputs, allowing evaluation against multiple models performance based on the same data.

In this example, the Shadow Deployment logs are retrieved after an inference.

logs = pipeline.logs_shadow_deploy()
logs
   
Input [[1.0678324729342086, 0.21778102664937624, -1.7115145261843976, 0.6822857209662413, 1.0138553066742804, -0.43350000129006655, 0.7395859436561657, -0.28828395953577357, -0.44726268795990787, 0.5146124987725894, 0.3791316964287545, 0.5190619748123175, -0.4904593221655364, 1.1656456468728569, -0.9776307444180006, -0.6322198962519854, -0.6891477694494687, 0.17833178574255615, 0.1397992467197424, -0.35542206494183326, 0.4394217876939808, 1.4588397511627804, -0.3886829614721505, 0.4353492889350186, 1.7420053483337177, -0.4434654615252943, -0.15157478906219238, -0.26684517248765616, -1.454961775612449]]
           
Model Type Model Name Output Timestamp Model Version Elapsed
Primary ccfraud-lstm [array([[0.00149742]])] 2022-08-04T20:37:53.965000 6b650c9c-e22f-4c50-97b2-7fce07f18607 125102
Challenger ccfraud-rf [{‘Float’: {‘v’: 1, ‘dim’: [1, 1], ‘data’: [1.0]}}]
Challenger ccfraud-xgb [{‘Float’: {‘v’: 1, ‘dim’: [1, 1], ‘data’: [0.0005066990852355957]}}]

Get Pipeline URL Endpoint

The Pipeline URL Endpoint or the Pipeline Deploy URL is used to submit data to a pipeline to use for an inference. This is done through the pipeline _deployment._url() method.

In this example, the pipeline URL endpoint for the pipeline ccfraud_pipeline will be displayed:

ccfraud_pipeline._deployment._url()

'http://engine-lb.ccfraud-pipeline-1:29502/pipelines/ccfraud-pipeline'

Run Inference Through a Pipeline

Once a pipeline has been deployed, an inference can be run. This will submit data to the pipeline, where it will then be submitted through each of the pipeline’s steps, with the output of the previous step providing the input for the new step. The final step will then output the result of all of the pipeline’s steps.

Both methods return a List[wallaroo.inference_result.InferenceResult] object. More details on the InferenceResult are listed below.

Run Inference through Pipeline Deployment URL

The pipeline _deployment._url() provides a URL where information can be submitted through HTTP POST in JSON format to the pipeline to perform an inference. This is useful in providing a resource where information can be submitted to the pipeline from different sources to the same pipeline remotely.

In this example, the aloha_pipeline’s deployment URL will be determined. An inference will then be made on data submitted to the aloha_pipeline through its deployment URL via a curl HTTP POST command:

aloha_pipeline._deployment._url()

'http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo'
!curl -X POST http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo -H "Content-Type:application/json" --data @data-25k.json > curl_response.txt

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.9M  100 10.1M  100 2886k   539k   149k  0:00:19  0:00:19 --:--:-- 2570k

Run Inference From A File

To submit a data file directly to a pipeline, use the pipeline infer_from_file({Data File}) command, where {Data File} is the path name to the submitted file.

In this example, an inference will be submitted to the aloha_pipeline with the file data-1.json:

aloha_pipeline.infer_from_file("data-1.json")

Waiting for inference response - this will take up to 45s .... ok
[InferenceResult({'check_failures': [],
  'elapsed': 329803334,
  'model_name': 'aloha-2',
  'model_version': '3dc9b7f9-faff-40cc-b1b6-7724edf11b12',
  'original_data': {'text_input': [[0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    28,
                                    16,
                                    32,
                                    23,
                                    29,
                                    32,
                                    30,
                                    19,
                                    26,
                                    17]]},
  'outputs': [{'Float': {'data': [0.001519620418548584], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.9829147458076477], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.012099534273147583], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [4.7593468480044976e-05],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [2.0289742678869516e-05],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [0.0003197789192199707],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [0.011029303073883057], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.9975639581680298], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.010341644287109375], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.008038878440856934], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.016155093908309937], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.006236225366592407], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.0009985864162445068],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [1.7933435344117743e-26],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [1.388984431455466e-27],
                         'dim': [1, 1],
                         'v': 1}}],
  'pipeline_name': 'aloha-test-demo',
  'time': 1648744282452})]

InferenceResult Object

The InferenceResult object produces the results of an inference and includes the following methods:

  • data() : The data resulting from the inference.

    In this example, an inference will be submitted to the ccfraud_pipeline with the file cc_data_1k.json, with only the data displayed:

    output = ccfraud_pipeline.infer_from_file('./cc_data_1k.json')
    output[0].data()
    
    [array([[9.93003249e-01],
          [9.93003249e-01],
          [9.93003249e-01],
          ...,
          [1.10703707e-03],
          [8.53300095e-04],
          [1.24984980e-03]])]
    
  • input_data(): Returns the data provided to the pipeline to run the inference.

    In this example, an inference will be submitted to the ccfraud_pipeline with the file cc_data_1k.json, with only the first element in the array returned:

    output = ccfraud_pipeline.infer_from_file('./cc_data_1k.json')
    
    output[0].input_data()["tensor"][0]
    
    [-1.060329750089797,
    2.354496709462385,
    -3.563878832646437,
    5.138734892618555,
    -1.23084570186641,
    -0.7687824607744093,
    -3.588122810891446,
    1.888083766259287,
    -3.2789674273886593,
    -3.956325455353324,
    4.099343911805088,
    -5.653917639476211,
    -0.8775733373342495,
    -9.131571191990632,
    -0.6093537872620682,
    -3.748027677256424,
    -5.030912501659983,
    -0.8748149525506821,
    1.9870535692026476,
    0.7005485718467245,
    0.9204422758154284,
    -0.10414918089758483,
    0.3229564351284999,
    -0.7418141656910608,
    0.03841201586730117,
    1.099343914614657,
    1.2603409755785089,
    -0.14662447391576958,
    -1.446321243938815]
    

Model Insights and Assays

Model Insights has the capability to perform interactive assays so that you can explore the data from a pipeline and learn how the data is behaving. With this information and the knowledge of your particular business use case you can then choose appropriate thresholds for persistent automatic assays as desired.

Monitoring tasks called assays can be set up to compare the data coming in against an established baseline. This way sudden changes in a model’s output can be determined to be either a correct outcome based on the data, or if the data has changed significantly enough that the model should be retrained to account for the changing environment.

Build Assay

Assay’s are built with the Wallaroo client.build_assay(assayName, pipeline, modelName, baselineStart, baselineEnd), and returns the wallaroo.assay_config.AssayBuilder. The method requires the following parameters:

Parameter Type Description
assayName String The human friendly name of the created assay.
pipeline Wallaroo.pipeline The pipeline the assay is assigned to.
modelName String The model to perform the assay on.
baselineStart DateTime When to start the baseline period.
baselineStart DateTime When to end the baseline period.

When called, this method will then pool the pipeline between the baseline start and end periods to establish what values are considered normal inputs for the specified model.

Assays by default will run a new a new analysis every 24 hours starting at the end of the baseline period.

In this example, an assay will be created named example assay and stored into the variable assay_builder. Once submitted to the Wallaroo instance, the

import datetime
baseline_start = datetime.datetime.fromisoformat('2022-01-01T00:00:00+00:00')
baseline_end = datetime.datetime.fromisoformat('2022-01-02T00:00:00+00:00')
last_day = datetime.datetime.fromisoformat('2022-02-01T00:00:00+00:00')

assay_name = "example assay"
assay_builder = client.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end)

Perform Interactive Baseline

Interactive baselines can be run against an assay to generate a list of the values that are established in the baseline. This is done through the AssayBuilder.interactive_baseline_run() method, which returns the following:

Parameter Type Description
count Integer The number of records evaluated.
min Float The minimum value found
max Float The maximum value found
mean Float The mean value derived from the values evaluated.
median Float The median value derived from the values evaluated.
std Float The standard deviation from the values evaluated.
start DateTime The start date for the records to evaluate.
end DateTime The end date for the records to evaluate.

In this example, an interactive baseline will be run against a new assay, and the results displayed:

``` python
baseline_run = assay_builder.build().interactive_baseline_run()
baseline_run.baseline_stats()

                    Baseline
count                   1813
min                    11.95
max                    15.08
mean                   12.95
median                 12.91
std                     0.46
start   2022-01-01T00:00:00Z
end     2022-01-02T00:00:00Z

Display Assay Graphs

Histogram, kernel density estimate (KDE), and Empirical Cumulative Distribution (ecdf) charts can be generated from an assay to provide a visual representation of the values evaluated and where they fit within the established baseline.

These methods are part of the AssayBuilder object and are as follows:

Method Description
baseline_histogram() Creates a histogram chart from the assay baseline.
baseline_kde() Creates a kernel density estimate (KDE) chart from the assay baseline.
baseline_ecdf() Creates an Empirical Cumulative Distribution (ecdf) from the assay baseline.

In this example, each of the three different charts will be generated from an assay:

assay_builder.baseline_histogram()

assay_builder.baseline_kde()

assay_builder.baseline_ecdf()

  •