Statsmodel Forecast with Wallaroo Features: Deploy and Test Infer

Deploy the sample Statsmodel and perform sample inferences.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Statsmodel Forecast with Wallaroo Features: Deploy and Test Infer

This tutorial series demonstrates how to use Wallaroo to create a Statsmodel forecasting model based on bike rentals. This tutorial series is broken down into the following:

Create and Train the Model: This first notebook shows how the model is trained from existing data.
Deploy and Sample Inference: With the model developed, we will deploy it into Wallaroo and perform a sample inference.
Parallel Infer: A sample of multiple weeks of data will be retrieved and submitted as an asynchronous parallel inference. The results will be collected and uploaded to a sample database.
External Connection: A sample data connection to Google BigQuery to retrieve input data and store the results in a table.
ML Workload Orchestration: Take all of the previous steps and automate the request into a single Wallaroo ML Workload Orchestration.

In the previous step “Statsmodel Forecast with Wallaroo Features: Model Creation”, the statsmodel was trained and saved to the Python file forecast.py. This file will now be uploaded to a Wallaroo instance as a Python model, then used for sample inferences.

Prerequisites

A Wallaroo instance version 2023.2.1 or greater.

References

Tutorial Steps

Import Libraries

The first step is to import the libraries that we will need.

import json
import os
import datetime
import pyarrow as pa

import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

Initialize connection

Start a connect to the Wallaroo instance and save the connection into the variable wl.

# Login through local Wallaroo instance

wl = wallaroo.Client()

Set Configurations

The following will set the workspace, model name, and pipeline that will be used for this example. If the workspace or pipeline already exist, then they will assigned for use in this example. If they do not exist, they will be created based on the names listed below.

Workspace names must be unique. To allow this tutorial to run in the same Wallaroo instance for multiple users, the suffix variable is generated from a random set of 4 ASCII characters. To use the same workspace each time, hard code suffix and verify the workspace name created is is unique across the Wallaroo instance.

workspace_name = f'multiple-replica-forecast-tutorial'
pipeline_name = 'bikedaypipe'

Set the Workspace and Pipeline

The workspace will be either used or created if it does not exist, along with the pipeline.

workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)

pipeline = wl.build_pipeline(pipeline_name)

Upload Model

The Python model created in “Forecast and Parallel Infer with Statsmodel: Model Creation” will now be uploaded. Note that the Framework and runtime are set to python.

model_name = 'bikedaymodel'
model_file_name = './models/forecast_standard_new.zip'

input_schema = pa.schema([
    pa.field('count', pa.list_(pa.int32())) # time series to fit model
    ]
)

output_schema = pa.schema([
    pa.field('forecast', pa.list_(pa.int32())), # returns a forecast for a week (7 steps)
    pa.field('weekly_average', pa.float32()),
])

bike_day_model = wl.upload_model(model_name, 
                                 model_file_name, 
                                 Framework.PYTHON,
                                 input_schema=input_schema,
                                 output_schema=output_schema
)

Waiting for model loading - this will take up to 10.0min. Model is pending loading to a container runtime.. Model is attempting loading to a container runtime.............successful

Ready

bike_day_model

Name	bikedaymodel
Version	51c8ec95-33a0-4df3-b8a3-61ca4502c6f2
File Name	forecast_standard_new.zip
SHA	96b4b27039f697f8a36ad15481e2d318cf603995553200b553c53f87a254fb2c
Status	ready
Image Path	proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2024.1.0-main-4921
Architecture	x86
Acceleration	none
Updated At	2024-15-Apr 14:44:18

Deploy the Pipeline

We will now add the uploaded model as a step for the pipeline, then deploy it. The pipeline configuration will allow for multiple replicas of the pipeline to be deployed and spooled up in the cluster. Each pipeline replica will use 0.25 cpu and 512 Gi RAM.

pipeline = wl.build_pipeline(pipeline_name)

# Set the deployment to allow for additional engines to run
deploy_config = (wallaroo.DeploymentConfigBuilder()
                        .replica_count(1)
                        .replica_autoscale_min_max(minimum=2, maximum=5)
                        .cpus(0.25)
                        .memory("512Mi")
                        .build()
                    )

pipeline.add_model_step(bike_day_model).deploy(deployment_config=deploy_config)

name	bikedaypipe
created	2024-04-15 14:36:58.942226+00:00
last_updated	2024-04-15 14:44:24.658835+00:00
deployed	True
arch	x86
accel	none
tags
versions	1ebe2ee2-7d7b-4608-8890-3361edacf887, 7580360a-8f89-4110-9d63-2c3519171799, dcec44ae-5648-4a9f-966f-31a87e138433, c9f6762f-1b14-4d3c-a530-8162e7c9bef4
steps	bikedaymodel
published	False

Run Inference

Run a test inference to verify the pipeline is operational from the sample test data stored in ./data/testdata_dict.json.

from resources import simdb as simdb

def mk_dt_range_query(*, tablename: str, seed_day: str) -> str:
    assert isinstance(tablename, str)
    assert isinstance(seed_day, str)
    query = f"select count from {tablename} where date > DATE(DATE('{seed_day}'), '-1 month') AND date <= DATE('{seed_day}')"
    return query

conn = simdb.get_db_connection()

# create the query
query = mk_dt_range_query(tablename=simdb.tablename, seed_day='2011-03-01')
print(query)

# read in the data
training_frame = pd.read_sql_query(query, conn)
training_frame

select count from bikerentals where date > DATE(DATE('2011-03-01'), '-1 month') AND date <= DATE('2011-03-01')

	count
0	1526
1	1550
2	1708
3	1005
4	1623
5	1712
6	1530
7	1605
8	1538
9	1746
10	1472
11	1589
12	1913
13	1815
14	2115
15	2475
16	2927
17	1635
18	1812
19	1107
20	1450
21	1917
22	1807
23	1461
24	1969
25	2402
26	1446
27	1851

data = {
        'count': [training_frame['count'].tolist()]
}
df = pd.DataFrame(data)
df

	count
0	[1526, 1550, 1708, 1005, 1623, 1712, 1530, 1605, 1538, 1746, 1472, 1589, 1913, 1815, 2115, 2475, 2927, 1635, 1812, 1107, 1450, 1917, 1807, 1461, 1969, 2402, 1446, 1851]

results = pipeline.infer(df)
display(results)

	time	in.count	out.forecast	out.weekly_average	anomaly.count
0	2024-04-15 14:44:41.384	[1526, 1550, 1708, 1005, 1623, 1712, 1530, 1605, 1538, 1746, 1472, 1589, 1913, 1815, 2115, 2475, 2927, 1635, 1812, 1107, 1450, 1917, 1807, 1461, 1969, 2402, 1446, 1851]	[1764, 1749, 1743, 1741, 1740, 1740, 1740]	1745.285714	0
1	2024-04-15 14:44:41.384	[1526, 1550, 1708, 1005, 1623, 1712, 1530, 1605, 1538, 1746, 1472, 1589, 1913, 1815, 2115, 2475, 2927, 1635, 1812, 1107, 1450, 1917, 1807, 1461, 1969, 2402, 1446, 1851]	[1764, 1749, 1743, 1741, 1740, 1740, 1740]	1745.285714	0

Undeploy the Pipeline

Undeploy the pipeline and return the resources back to the Wallaroo instance.

pipeline.undeploy()

name	bikedaypipe
created	2024-04-15 14:36:58.942226+00:00
last_updated	2024-04-15 14:44:24.658835+00:00
deployed	False
arch	x86
accel	none
tags
versions	1ebe2ee2-7d7b-4608-8890-3361edacf887, 7580360a-8f89-4110-9d63-2c3519171799, dcec44ae-5648-4a9f-966f-31a87e138433, c9f6762f-1b14-4d3c-a530-8162e7c9bef4
steps	bikedaymodel
published	False