Statsmodel Forecast with Wallaroo Features: Model Creation

Training the Statsmodel to predict bike rentals.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Statsmodel Forecast with Wallaroo Features: Model Creation

This tutorial series demonstrates how to use Wallaroo to create a Statsmodel forecasting model based on bike rentals. This tutorial series is broken down into the following:

Create and Train the Model: This first notebook shows how the model is trained from existing data.
Deploy and Sample Inference: With the model developed, we will deploy it into Wallaroo and perform a sample inference.
Parallel Infer: A sample of multiple weeks of data will be retrieved and submitted as an asynchronous parallel inference. The results will be collected and uploaded to a sample database.
External Connection: A sample data connection to Google BigQuery to retrieve input data and store the results in a table.
ML Workload Orchestration: Take all of the previous steps and automate the request into a single Wallaroo ML Workload Orchestration.

Prerequisites

A Wallaroo instance version 2023.2.1 or greater.

References

import pandas as pd
import datetime
import os

from statsmodels.tsa.arima.model import ARIMA
from resources import simdb as simdb

Train the Model

The resources to train the model will start with the local file day.csv. This data is load and prepared for use in training the model.

For this example, the simulated database is controled by the resources simbdb.

def mk_dt_range_query(*, tablename: str, seed_day: str) -> str:
    assert isinstance(tablename, str)
    assert isinstance(seed_day, str)
    query = f"select count from {tablename} where date > DATE(DATE('{seed_day}'), '-1 month') AND date <= DATE('{seed_day}')"
    return query

conn = simdb.get_db_connection()

# create the query
query = mk_dt_range_query(tablename=simdb.tablename, seed_day='2011-03-01')
print(query)

# read in the data
training_frame = pd.read_sql_query(query, conn)
training_frame

select count from bikerentals where date > DATE(DATE('2011-03-01'), '-1 month') AND date <= DATE('2011-03-01')

	count
0	1526
1	1550
2	1708
3	1005
4	1623
5	1712
6	1530
7	1605
8	1538
9	1746
10	1472
11	1589
12	1913
13	1815
14	2115
15	2475
16	2927
17	1635
18	1812
19	1107
20	1450
21	1917
22	1807
23	1461
24	1969
25	2402
26	1446
27	1851

Test the Forecast

The training frame is then loaded, and tested against our forecast model.

# test
from models import forecast_standard as forecast
import importlib
importlib.reload(forecast)
import json

# create the appropriate json
# jsonstr = json.dumps(training_frame.to_dict(orient='list'))
# print(jsonstr)

data = {
        'count': [training_frame['count']]
}
display(data)

result = forecast.process_data(data)
display(result)

{'count': [0     1526
  1     1550
  2     1708
  3     1005
  4     1623
  5     1712
  6     1530
  7     1605
  8     1538
  9     1746
  10    1472
  11    1589
  12    1913
  13    1815
  14    2115
  15    2475
  16    2927
  17    1635
  18    1812
  19    1107
  20    1450
  21    1917
  22    1807
  23    1461
  24    1969
  25    2402
  26    1446
  27    1851
  Name: count, dtype: int64]}
{‘forecast’: array([[1764, 1749, 1743, 1741, 1740, 1740, 1740]]),

‘weekly_average’: array([1745.28571429])}