This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
This tutorial series demonstrates how to use Wallaroo to create a Statsmodel forecasting model based on bike rentals. This tutorial series is broken down into the following:
import pandas as pd
import datetime
import os
from statsmodels.tsa.arima.model import ARIMA
from resources import simdb as simdb
The resources to train the model will start with the local file day.csv
. This data is load and prepared for use in training the model.
For this example, the simulated database is controled by the resources simbdb
.
def mk_dt_range_query(*, tablename: str, seed_day: str) -> str:
assert isinstance(tablename, str)
assert isinstance(seed_day, str)
query = f"select count from {tablename} where date > DATE(DATE('{seed_day}'), '-1 month') AND date <= DATE('{seed_day}')"
return query
conn = simdb.get_db_connection()
# create the query
query = mk_dt_range_query(tablename=simdb.tablename, seed_day='2011-03-01')
print(query)
# read in the data
training_frame = pd.read_sql_query(query, conn)
training_frame
select count from bikerentals where date > DATE(DATE('2011-03-01'), '-1 month') AND date <= DATE('2011-03-01')
count | |
---|---|
0 | 1526 |
1 | 1550 |
2 | 1708 |
3 | 1005 |
4 | 1623 |
5 | 1712 |
6 | 1530 |
7 | 1605 |
8 | 1538 |
9 | 1746 |
10 | 1472 |
11 | 1589 |
12 | 1913 |
13 | 1815 |
14 | 2115 |
15 | 2475 |
16 | 2927 |
17 | 1635 |
18 | 1812 |
19 | 1107 |
20 | 1450 |
21 | 1917 |
22 | 1807 |
23 | 1461 |
24 | 1969 |
25 | 2402 |
26 | 1446 |
27 | 1851 |
The training frame is then loaded, and tested against our forecast
model.
# test
from models import forecast_standard as forecast
import importlib
importlib.reload(forecast)
import json
# create the appropriate json
# jsonstr = json.dumps(training_frame.to_dict(orient='list'))
# print(jsonstr)
data = {
'count': [training_frame['count']]
}
display(data)
result = forecast.process_data(data)
display(result)
{'count': [0 1526
1 1550
2 1708
3 1005
4 1623
5 1712
6 1530
7 1605
8 1538
9 1746
10 1472
11 1589
12 1913
13 1815
14 2115
15 2475
16 2927
17 1635
18 1812
19 1107
20 1450
21 1917
22 1807
23 1461
24 1969
25 2402
26 1446
27 1851
Name: count, dtype: int64]}
{‘forecast’: array([[1764, 1749, 1743, 1741, 1740, 1740, 1740]]),
‘weekly_average’: array([1745.28571429])}