Wallaroo MLOps API Essentials Guide: Pipeline Log Management
Table of Contents
Pipeline logs are retrieved through the Wallaroo MLOps API with the following request.
- REQUEST URL
v1/api/pipelines/get_logs
- Headers
- Accept:
application/json; format=pandas-records
: For the logs returned as pandas DataFrameapplication/vnd.apache.arrow.file
: for the logs returned as Apache Arrow
- Accept:
- PARAMETERS
- pipeline_id (String Required): The name of the pipeline.
- workspace_id (Integer Required): The numerical identifier of the workspace.
- cursor (String Optional): Cursor returned with a previous page of results from a pipeline log request, used to retrieve the next page of information.
- order (String Optional Default:
Desc
): The order for log inserts returned. Valid values are:Asc
: In chronological order of inserts.Desc
: In reverse chronological order of inserts.
- page_size (Integer Optional Default:
1000
.): Max records per page. - start_time (String Optional): The start time of the period to retrieve logs for in RFC 3339 format for DateTime. Must be combined with
end_time
. - end_time (String Optional): The end time of the period to retrieve logs for in RFC 3339 format for DateTime. Must be combined with
start_time
.
- RETURNS
- The logs are returned by default as
'application/json; format=pandas-records'
format. To request the logs as Apache Arrow tables, set the submission headerAccept
toapplication/vnd.apache.arrow.file
. - Headers:
- x-iteration-cursor: Used to retrieve the next page of results. This is not included if
x-iteration-status
isAll
. - x-iteration-status: Informs whether there are more records available outside of this log request parameters.
- All: This page includes all logs available from this request. If
x-iteration-status
isAll
, thenx-iteration-cursor
is not provided. - SchemaChange: A change in the log schema caused by actions such as pipeline version, etc.
- RecordLimited: The number of records exceeded from the page size, more records can be requested as the next page. There may be more records available to retrieve OR the record limit was reached for this request even if no more records are available in next cursor request.
- ByteLimited: The number of records exceeded the pipeline log limit which is around 100K.
- All: This page includes all logs available from this request. If
- x-iteration-cursor: Used to retrieve the next page of results. This is not included if
- The logs are returned by default as
Standard Pipeline Logs Example
# retrieve the authorization token
headers = wl.auth.auth_header()
url = f"{APIURL}/v1/api/pipelines/get_logs"
# Standard log retrieval
data = {
'pipeline_id': main_pipeline_name,
'workspace_id': workspace_id
}
response = requests.post(url, headers=headers, json=data)
standard_logs = pd.DataFrame.from_records(response.json())
display(standard_logs.head(5).loc[:, ["time", "in", "out"]])
time | in | out | |
---|---|---|---|
0 | 1684423875900 | {'tensor': [4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]} | {'variable': [718013.75]} |
1 | 1684423875900 | {'tensor': [2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0, 8.0, 2170.0, 0.0, 47.7109, -122.017, 2310.0, 7419.0, 6.0, 0.0, 0.0]} | {'variable': [615094.56]} |
2 | 1684423875900 | {'tensor': [3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, 8.0, 880.0, 420.0, 47.5893, -122.317, 1300.0, 824.0, 6.0, 0.0, 0.0]} | {'variable': [448627.72]} |
3 | 1684423875900 | {'tensor': [4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0, 9.0, 2500.0, 0.0, 47.5759, -121.994, 2560.0, 8475.0, 24.0, 0.0, 0.0]} | {'variable': [758714.2]} |
4 | 1684423875900 | {'tensor': [3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4.0, 7.0, 2200.0, 0.0, 47.7659, -122.341, 1690.0, 8038.0, 62.0, 0.0, 0.0]} | {'variable': [513264.7]} |
Shadow Deploy Pipeline Logs Example
# Retrieve logs from specific date/time to only get the two DataFrame input inferences in ascending format
# retrieve the authorization token
headers = wl.auth.auth_header()
url = f"{APIURL}/v1/api/pipelines/get_logs"
# Standard log retrieval
data = {
'pipeline_id': main_pipeline_name,
'workspace_id': workspace_id,
'order': 'Asc',
'start_time': f'{shadow_date_start.isoformat()}',
'end_time': f'{shadow_date_end.isoformat()}'
}
response = requests.post(url, headers=headers, json=data)
standard_logs = pd.DataFrame.from_records(response.json())
display(standard_logs.head(5).loc[:, ["time", "out", "out_logcontrolchallenger01", "out_logcontrolchallenger02"]])
time | out | out_logcontrolchallenger01 | out_logcontrolchallenger02 | |
---|---|---|---|---|
0 | 1684427140394 | {'variable': [718013.75]} | {'variable': [659806.0]} | {'variable': [704901.9]} |
1 | 1684427140394 | {'variable': [615094.56]} | {'variable': [732883.5]} | {'variable': [695994.44]} |
2 | 1684427140394 | {'variable': [448627.72]} | {'variable': [419508.84]} | {'variable': [416164.8]} |
3 | 1684427140394 | {'variable': [758714.2]} | {'variable': [634028.8]} | {'variable': [655277.2]} |
4 | 1684427140394 | {'variable': [513264.7]} | {'variable': [427209.44]} | {'variable': [426854.66]} |
A/B Testing Pipeline Logs Example
# Retrieve logs from specific date/time to only get the two DataFrame input inferences in ascending format
# retrieve the authorization token
headers = wl.auth.auth_header()
url = f"{APIURL}/v1/api/pipelines/get_logs"
# Standard log retrieval
data = {
'pipeline_id': main_pipeline_name,
'workspace_id': workspace_id,
'order': 'Asc',
'start_time': f'{ab_date_start.isoformat()}',
'end_time': f'{ab_date_end.isoformat()}'
}
response = requests.post(url, headers=headers, json=data)
standard_logs = pd.DataFrame.from_records(response.json())
display(standard_logs.head(5).loc[:, ["time", "out"]])
time | out | |
---|---|---|
0 | 1684427501820 | {'_model_split': ['{"name":"logcontrolchallenger02","version":"89dba25e-a11e-453d-9bcc-cddf8d6ddea0","sha":"ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a"}'], 'variable': [715947.75]} |
1 | 1684427502196 | {'_model_split': ['{"name":"logcontrolchallenger01","version":"07fe7686-9bd0-4fd3-9a7c-0e933a74003c","sha":"31e92d6ccb27b041a324a7ac22cf95d9d6cc3aa7e8263a229f7c4aec4938657c"}'], 'variable': [341386.34]} |
2 | 1684427503778 | {'_model_split': ['{"name":"logapicontrol","version":"70b76ecb-55c2-4d68-be9b-b440b11e6499","sha":"e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6"}'], 'variable': [1039781.2]} |
3 | 1684427504566 | {'_model_split': ['{"name":"logcontrolchallenger02","version":"89dba25e-a11e-453d-9bcc-cddf8d6ddea0","sha":"ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a"}'], 'variable': [411090.75]} |
4 | 1684427505094 | {'_model_split': ['{"name":"logcontrolchallenger02","version":"89dba25e-a11e-453d-9bcc-cddf8d6ddea0","sha":"ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a"}'], 'variable': [296175.66]} |
Pipeline Log Storage
Pipeline logs have a set allocation of storage space and data requirements.
Pipeline Log Storage Warnings
To prevent storage and performance issues, inference result data may be dropped from pipeline logs by the following standards:
- Columns are progressively removed from the row starting with the largest input data size and working to the smallest, then the same for outputs.
For example, Computer Vision ML Models typically have large inputs and output values - a single pandas DataFrame inference request may be over 13 MB in size, and the inference results nearly as large. To prevent pipeline log storage issues, the input may be dropped from the pipeline logs, and if additional space is needed, the inference outputs would follow. The time
column is preserved.
IMPORTANT NOTE
Inference Requests will always return all inputs, outputs, and other metadata unless specifically requested for exclusion. It is the pipeline logs that may drop columns for space purposes.If a pipeline has dropped columns for space purposes, this will be displayed when a log request is made with the following warning, with {columns} replaced with the dropped columns.
The inference log is above the allowable limit and the following columns may have been suppressed for various rows in the logs: {columns}. To review the dropped columns for an individual inference’s suppressed data, include dataset=["metadata"] in the log request.
Review Dropped Columns
To review what columns are dropped from pipeline logs for storage reasons, include the dataset metadata
in the request to view the column metadata.dropped
. This metadata field displays a List of any columns dropped from the pipeline logs.
For example:
# retrieve the authorization token
headers = wl.auth.auth_header()
url = f"{APIURL}/v1/api/pipelines/get_logs"
# Standard log retrieval
data = {
'pipeline_id': main_pipeline_name,
'workspace_id': workspace_id
}
response = requests.post(url, headers=headers, json=data)
standard_logs = pd.DataFrame.from_records(response.json())
display(len(standard_logs))
display(standard_logs.head(5).loc[:, ["time", "metadata"]])
cursor = response.headers['x-iteration-cursor']
time | metadata | |
---|---|---|
0 | 1688760035752 | {’last_model’: ‘{“model_name”:“logapicontrol”,“model_sha”:“e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6”}’, ‘pipeline_version’: ‘’, ’elapsed’: [112967, 267146], ‘dropped’: []} |
1 | 1688760036054 | {’last_model’: ‘{“model_name”:“logapicontrol”,“model_sha”:“e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6”}’, ‘pipeline_version’: ‘’, ’elapsed’: [37127, 594183], ‘dropped’: []} |
2 | 1688759857040 | {’last_model’: ‘{“model_name”:“logapicontrol”,“model_sha”:“e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6”}’, ‘pipeline_version’: ‘’, ’elapsed’: [111082, 253184], ‘dropped’: []} |
3 | 1688759857526 | {’last_model’: ‘{“model_name”:“logapicontrol”,“model_sha”:“e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6”}’, ‘pipeline_version’: ‘’, ’elapsed’: [43962, 265740], ‘dropped’: []} |
Suppressed Data Elements
Data elements that do not fit the supported data types below, such as None
or Null
values, are not supported in pipeline logs. When present, undefined data will be written in the place of the null value, typically zeroes. Any null list values will present an empty list.