Wallaroo ML Workload Orchestration Requirements
Orchestration Requirements
Orchestrations are uploaded to the Wallaroo instance as a ZIP file with the following requirements:
Parameter | Type | Description |
---|---|---|
User Code | (Required) Python script as .py files | If main.py exists, then that will be used as the task entrypoint. Otherwise, the first main.py found in any subdirectory will be used as the entrypoint. If no main.py is found, the orchestration will not be accepted. |
Python Library Requirements | (Optional) requirements.txt file in the requirements file format. | A standard Python requirements.txt for any dependencies to be provided in the task environment. The Wallaroo SDK will already be present and should not be included in the requirements.txt. Multiple requirements.txt files are not allowed. |
Other artifacts | Other artifacts such as files, data, or code to support the orchestration. |
Zip Instructions
In a terminal with the zip
command, assemble artifacts as above and then create the archive. The zip
command is included by default with the Wallaroo JupyterHub service.
zip
commands take the following format, with {zipfilename}.zip
as the zip file to save the artifacts to, and each file thereafter as the files to add to the archive.
zip {zipfilename}.zip file1, file2, file3....
For example, the following command will add the files main.py
and requirements.txt
into the file hello.zip
.
$ zip hello.zip main.py requirements.txt
adding: main.py (deflated 47%)
adding: requirements.txt (deflated 52%)
Example requirements.txt file
dbt-bigquery==1.4.3
dbt-core==1.4.5
dbt-extractor==0.4.1
dbt-postgres==1.4.5
google-api-core==2.8.2
google-auth==2.11.0
google-auth-oauthlib==0.4.6
google-cloud-bigquery==3.3.2
google-cloud-bigquery-storage==2.15.0
google-cloud-core==2.3.2
google-cloud-storage==2.5.0
google-crc32c==1.5.0
google-pasta==0.2.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.4
Orchestration Recommendations
The following recommendations will make using Wallaroo orchestrations.
- The version of Python used should match the same version as in the Wallaroo JupyterHub service.
- The same version of the Wallaroo SDK should match the server. For a 2023.2.1 Wallaroo instance, use the Wallaroo SDK version 2023.2.1.
- Specify the version of
pip
dependencies. - The
wallaroo.Client
constructorauth_type
argument is ignored. Usingwallaroo.Client()
is sufficient. - The following methods will assist with orchestrations:
wallaroo.in_task()
: ReturnsTrue
if the code is running within an orchestration task.wallaroo.task_args()
: Returns aDict
of invocation-specific arguments passed to therun_
calls.
- Orchestrations will be run in the same way as running within the Wallaroo JupyterHub service, from the version of Python libraries (unless specifically overridden by the
requirements.txt
setting, which is not recommended), and running in the virtualized directory/home/jovyan/
.
Orchestration Code Samples
The following demonstres using the wallaroo.in_task()
and wallaroo.task_args()
methods within an Orchestration. This sample code uses wallaroo.in_task()
to verify whether or not the script is running as a Wallaroo Task. If true, it will gather the wallaroo.task_args()
and use them to set the workspace and pipeline. If False, then it sets the pipeline and workspace manually.
# get the arguments
wl = wallaroo.Client()
# if true, get the arguments passed to the task
if wl.in_task():
arguments = wl.task_args()
# arguments is a key/value pair, set the workspace and pipeline name
workspace_name = arguments['workspace_name']
pipeline_name = arguments['pipeline_name']
# False: We're not in a Task, so set the pipeline manually
else:
workspace_name="bigqueryworkspace"
pipeline_name="bigquerypipeline"