ARM Classification Cybersecurity Demonstration
For details on adding ARM nodepools to a Wallaroo cluster, see Create ARM Nodepools for Kubernetes Clusters.
This tutorial is available on the Wallaroo Tutorials repository.
Classification Cybersecurity with Arm Architecture
This tutorial demonstrates how to use the Wallaroo combined with ARM processors to perform inferences with pre-trained classification cybersecurity ML models. This demonstration assumes that:
- Wallaroo Version 2023.3 or above instance is installed.
- A nodepools with ARM architecture virtual machines are part of the Kubernetes cluster. For example, Azure supports Ampere® Altra® Arm-based processor included with the following virtual machines:
In this notebook we will walk through a simple pipeline deployment to inference on a model. For this example we will be using an open source model that uses an Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.
Tutorial Goals
For our example, we will perform the following:
- Create a workspace for our work.
- Upload the Aloha model and set the architecture to ARM.
- Create a pipeline and deploy the model on ARM then perform sample inferences.
All sample data and models are available through the Wallaroo Quick Start Guide Samples repository.
Connect to the Wallaroo Instance
The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. For more details on logging in through Wallaroo, see the Wallaroo SDK Essentials Guide: Client Connection.
import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework
# to display dataframe tables
from IPython.display import display
# used to display dataframe information without truncating
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)
import pyarrow as pa
import time
# Login through local Wallaroo instance
wl = wallaroo.Client()
Create Workspace
Now we’ll use the SDK below to create our workspace , assign as our current workspace, then display all of the workspaces we have at the moment. We’ll also set up for our models and pipelines down the road, so we have one spot to change names to whatever fits your organization’s standards best.
When we create our new workspace, we’ll save it in the Python variable workspace
so we can refer to it as needed.
For more information, see the Wallaroo SDK Essentials Guide: Workspace Management.
workspace_name = f'arm-classification-security'
pipeline_name = 'alohapipeline'
model_name = 'alohamodel'
model_file_name = './models/alohacnnlstm.zip'
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)
wl.set_current_workspace(workspace)
{'name': 'arm-classification-securityjohn', 'id': 24, 'archived': False, 'created_by': '0e5060a5-218c-47c1-9678-e83337494184', 'created_at': '2023-09-08T21:32:35.381464+00:00', 'models': [], 'pipelines': []}
We can verify the workspace is created the current default workspace with the get_current_workspace()
command.
wl.get_current_workspace()
{'name': 'arm-classification-securityjohn', 'id': 24, 'archived': False, 'created_by': '0e5060a5-218c-47c1-9678-e83337494184', 'created_at': '2023-09-08T21:32:35.381464+00:00', 'models': [], 'pipelines': []}
Upload the Models
Now we will upload our model. Note that for this example we are applying the model from a .ZIP file. The Aloha model is a protobuf file that has been defined for evaluating web pages, and we will configure it to use data in the tensorflow
format.
We will upload our model and set the arch
to wallaroo.engine_config.Architecture.ARM
. Architectures are set during the model upload process.
For more information, see the Wallaroo SDK Essentials Guide: Model Uploads and Registrations.
from wallaroo.engine_config import Architecture
arm_model = wl.upload_model(model_name,
model_file_name,
framework=Framework.TENSORFLOW,
arch=Architecture.ARM).configure("tensorflow")
Deploy with ARM
We deploy the pipeline. Our deployment configuration sets what resources are allocated for the model deployment. The architecture is inherited from the model, so will automatically allocate from the nodepool with ARM processors.
aloha_pipeline = wl.build_pipeline(pipeline_name)
deployment_config = (wallaroo.deployment_config
.DeploymentConfigBuilder()
.cpus(4)
.memory('8Gi')
.build()
)
# clear the steps if used before
aloha_pipeline.clear()
aloha_pipeline.add_model_step(arm_model)
aloha_pipeline.deploy(deployment_config = deployment_config)
ok
Waiting for deployment - this will take up to 45s .................................. ok
name | alohapipeline |
---|---|
created | 2023-09-08 21:32:58.216028+00:00 |
last_updated | 2023-09-08 21:35:11.663764+00:00 |
deployed | True |
tags | |
versions | 9c61af33-2934-4552-bf45-42d03441a64b, 9c51cf24-9fcc-40c1-82ab-297972ce488d, 21218bd6-8ce8-4315-9683-b5a7542a0a94, fcde3598-c68d-4310-aea6-b3e98d4a4fb7 |
steps | alohamodel |
published | False |
Infer with ARM
We will now perform an inference request through the model deployment on ARM architecture. This example will us an Apache Arrow table.
startTime = time.time()
result = aloha_pipeline.infer_from_file('./data/data_25k.arrow')
endTime = time.time()
arm_time = endTime-startTime
display(result.to_pandas().loc[:, ["time","out.main"]])
time | out.main | |
---|---|---|
0 | 2023-09-08 21:36:22.500 | [0.997564] |
1 | 2023-09-08 21:36:22.500 | [0.99999994] |
2 | 2023-09-08 21:36:22.500 | [1.0] |
3 | 2023-09-08 21:36:22.500 | [0.9999997] |
4 | 2023-09-08 21:36:22.500 | [0.9999989] |
... | ... | ... |
24949 | 2023-09-08 21:36:22.500 | [0.9996881] |
24950 | 2023-09-08 21:36:22.500 | [0.99981505] |
24951 | 2023-09-08 21:36:22.500 | [0.9999919] |
24952 | 2023-09-08 21:36:22.500 | [1.0] |
24953 | 2023-09-08 21:36:22.500 | [0.99999803] |
24954 rows × 2 columns
Undeploy Pipeline
When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks.
aloha_pipeline.undeploy()
Waiting for undeployment - this will take up to 45s .................................... ok
name | alohapipeline |
---|---|
created | 2023-09-08 21:32:58.216028+00:00 |
last_updated | 2023-09-08 21:35:11.663764+00:00 |
deployed | False |
tags | |
versions | 9c61af33-2934-4552-bf45-42d03441a64b, 9c51cf24-9fcc-40c1-82ab-297972ce488d, 21218bd6-8ce8-4315-9683-b5a7542a0a94, fcde3598-c68d-4310-aea6-b3e98d4a4fb7 |
steps | alohamodel |
published | False |