ARM Classification Cybersecurity Demonstration

A demonstration of a classification cybersecurity model deployed under standard and ARM pipeline deployment configurations.

For details on adding ARM nodepools to a Wallaroo cluster, see Create ARM Nodepools for Kubernetes Clusters.

This tutorial is available on the Wallaroo Tutorials repository.

Classification Cybersecurity with Arm Architecture

This tutorial demonstrates how to use the Wallaroo combined with ARM processors to perform inferences with pre-trained classification cybersecurity ML models. This demonstration assumes that:

  • Wallaroo Version 2023.3 or above instance is installed.
  • A nodepools with ARM architecture virtual machines are part of the Kubernetes cluster. For example, Azure supports Ampere® Altra® Arm-based processor included with the following virtual machines:

In this notebook we will walk through a simple pipeline deployment to inference on a model. For this example we will be using an open source model that uses an Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.

Tutorial Goals

For our example, we will perform the following:

  • Create a workspace for our work.
  • Upload the Aloha model and set the architecture to ARM.
  • Create a pipeline and deploy the model on ARM then perform sample inferences.

All sample data and models are available through the Wallaroo Quick Start Guide Samples repository.

Connect to the Wallaroo Instance

The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). If logging in externally, update the wallarooPrefix and wallarooSuffix variables with the proper DNS information. For more information on Wallaroo DNS settings, see the Wallaroo DNS Integration Guide.

import wallaroo
from wallaroo.object import EntityNotFoundError

from wallaroo.framework import Framework

# to display dataframe tables
from IPython.display import display
# used to display dataframe information without truncating
import pandas as pd
pd.set_option('display.max_colwidth', None)
import pyarrow as pa
import time
# Login through local Wallaroo instance

wl = wallaroo.Client()

Create Workspace

Now we’ll use the SDK below to create our workspace , assign as our current workspace, then display all of the workspaces we have at the moment. We’ll also set up for our models and pipelines down the road, so we have one spot to change names to whatever fits your organization’s standards best.

When we create our new workspace, we’ll save it in the Python variable workspace so we can refer to it as needed.

For more information, see the Wallaroo SDK Essentials Guide: Workspace Management.

workspace_name = f'arm-classification-security'
pipeline_name = 'alohapipeline'
model_name = 'alohamodel'
model_file_name = './models/alohacnnlstm.zip'
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)
{'name': 'arm-classification-securityjohn', 'id': 24, 'archived': False, 'created_by': '0e5060a5-218c-47c1-9678-e83337494184', 'created_at': '2023-09-08T21:32:35.381464+00:00', 'models': [], 'pipelines': []}

We can verify the workspace is created the current default workspace with the get_current_workspace() command.

wl.get_current_workspace()
{'name': 'arm-classification-securityjohn', 'id': 24, 'archived': False, 'created_by': '0e5060a5-218c-47c1-9678-e83337494184', 'created_at': '2023-09-08T21:32:35.381464+00:00', 'models': [], 'pipelines': []}

Upload the Models

Now we will upload our model. Note that for this example we are applying the model from a .ZIP file. The Aloha model is a protobuf file that has been defined for evaluating web pages, and we will configure it to use data in the tensorflow format.

We will upload our model and set the arch to wallaroo.engine_config.Architecture.ARM. Architectures are set during the model upload process.

For more information, see the Wallaroo SDK Essentials Guide: Model Uploads and Registrations.

from wallaroo.engine_config import Architecture

arm_model = wl.upload_model(model_name, 
                            model_file_name, 
                            framework=Framework.TENSORFLOW,
                            arch=Architecture.ARM).configure("tensorflow")

Deploy with ARM

We deploy the pipeline. Our deployment configuration sets what resources are allocated for the model deployment. The architecture is inherited from the model, so will automatically allocate from the nodepool with ARM processors.

aloha_pipeline = wl.build_pipeline(pipeline_name)

deployment_config = (wallaroo.deployment_config
                     .DeploymentConfigBuilder()
                     .cpus(4)
                     .memory('8Gi')
                     .build()
                    )

# clear the steps if used before
aloha_pipeline.clear()
aloha_pipeline.add_model_step(arm_model)
aloha_pipeline.deploy(deployment_config = deployment_config)
 ok
Waiting for deployment - this will take up to 45s .................................. ok
namealohapipeline
created2023-09-08 21:32:58.216028+00:00
last_updated2023-09-08 21:35:11.663764+00:00
deployedTrue
tags
versions9c61af33-2934-4552-bf45-42d03441a64b, 9c51cf24-9fcc-40c1-82ab-297972ce488d, 21218bd6-8ce8-4315-9683-b5a7542a0a94, fcde3598-c68d-4310-aea6-b3e98d4a4fb7
stepsalohamodel
publishedFalse

Infer with ARM

We will now perform an inference request through the model deployment on ARM architecture. This example will us an Apache Arrow table.

startTime = time.time()
result = aloha_pipeline.infer_from_file('./data/data_25k.arrow')
endTime = time.time()
arm_time = endTime-startTime
display(result.to_pandas().loc[:, ["time","out.main"]])
timeout.main
02023-09-08 21:36:22.500[0.997564]
12023-09-08 21:36:22.500[0.99999994]
22023-09-08 21:36:22.500[1.0]
32023-09-08 21:36:22.500[0.9999997]
42023-09-08 21:36:22.500[0.9999989]
.........
249492023-09-08 21:36:22.500[0.9996881]
249502023-09-08 21:36:22.500[0.99981505]
249512023-09-08 21:36:22.500[0.9999919]
249522023-09-08 21:36:22.500[1.0]
249532023-09-08 21:36:22.500[0.99999803]

24954 rows × 2 columns

Undeploy Pipeline

When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks.

aloha_pipeline.undeploy()
Waiting for undeployment - this will take up to 45s .................................... ok
namealohapipeline
created2023-09-08 21:32:58.216028+00:00
last_updated2023-09-08 21:35:11.663764+00:00
deployedFalse
tags
versions9c61af33-2934-4552-bf45-42d03441a64b, 9c51cf24-9fcc-40c1-82ab-297972ce488d, 21218bd6-8ce8-4315-9683-b5a7542a0a94, fcde3598-c68d-4310-aea6-b3e98d4a4fb7
stepsalohamodel
publishedFalse