Wallaroo SDK Essentials Guide: Model Uploads and Registrations: SKLearn
Table of Contents
Model Naming Requirements
Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must be lower case ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.
Wallaroo supports SKLearn models by containerizing the model and running as an image.
Sci-kit Learn aka SKLearn.
| Parameter | Description | 
|---|---|
| Web Site | https://scikit-learn.org/stable/index.html | 
| Supported Libraries | 
 | 
| Framework | Framework.SKLEARNakasklearn | 
During the model upload process, Wallaroo optimizes models by converting them to the Wallaroo Native Runtime, if possible, or running the model directly in the Wallaroo Containerized Runtime. See the Model Deploy for details on how to configure pipeline resources based on the model’s runtime.
SKLearn Schema Inputs
SKLearn schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. For example, the following DataFrame has 4 columns, each column a float.
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | 
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | 
For submission to an SKLearn model, the data input schema will be a single array with 4 float values.
input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])
When submitting as an inference, the DataFrame is converted to rows with the column data expressed as a single array. The data must be in the same order as the model expects, which is why the data is submitted as a single array rather than JSON labeled columns: this insures that the data is submitted in the exact order as the model is trained to accept.
Original DataFrame:
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | 
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | 
Converted DataFrame:
| inputs | |
|---|---|
| 0 | [5.1, 3.5, 1.4, 0.2] | 
| 1 | [4.9, 3.0, 1.4, 0.2] | 
SKLearn Schema Outputs
Outputs for SKLearn that are meant to be predictions or probabilities when output by the model are labeled in the output schema for the model when uploaded to Wallaroo. For example, a model that outputs either 1 or 0 as its output would have the output schema as follows:
output_schema = pa.schema([
    pa.field('predictions', pa.int32())
])
When used in Wallaroo, the inference result is contained in the out metadata as out.predictions.
pipeline.infer(dataframe)
| time | in.inputs | out.predictions | anomaly.count | |
|---|---|---|---|---|
| 0 | 2023-07-05 15:11:29.776 | [5.1, 3.5, 1.4, 0.2] | 0 | 0 | 
| 1 | 2023-07-05 15:11:29.776 | [4.9, 3.0, 1.4, 0.2] | 0 | 0 | 
Uploading SKLearn Models
SKLearn models are uploaded to Wallaroo through the Wallaroo Client upload_model method.
Upload SKLearn Model Parameters
The following parameters are required for SKLearn models. Note that while some fields are considered as optional for the upload_model method, they are required for proper uploading of a SKLearn model to Wallaroo.
| Parameter | Type | Description | 
|---|---|---|
| name | string(Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. | 
| path | string(Required) | The path to the model file being uploaded. | 
| framework | string(Required) | Set as the Framework.SKLEARN. | 
| input_schema | pyarrow.lib.Schema(Required) | The input schema in Apache Arrow schema format. | 
| output_schema | pyarrow.lib.Schema(Required) | The output schema in Apache Arrow schema format. | 
| convert_wait | bool(Optional) (Default: True) | 
 | 
| arch | wallaroo.engine_config.Architecture | The architecture the model is deployed to. If a model is intended for deployment to an ARMarchitecture, it must be specified during this step. Values include:X86(Default): x86 based architectures.ARM: ARM based architectures. | 
Once the upload process starts, the model is containerized by the Wallaroo instance. This process may take up to 10 minutes.
Upload SKLearn Model Return
upload_model returns a wallaroo.model_version.ModelVersion object with the following fields.
| Field | Type | Description | 
|---|---|---|
| name | String | The name of the model. | 
| version | String | The model version as a unique UUID. | 
| file_name | String | The file name of the model as stored in Wallaroo. | 
| SHA | String | The hash value of the model file. | 
| Status | String | The status of the model. | 
| image_path | String | The image used to deploy the model in the Wallaroo engine. | 
| last_update_time | DateTime | When the model was last updated. | 
Upload SKLearn Model Example
The following example is of uploading a pickled SKLearn ML Model to a Wallaroo instance.
input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])
output_schema = pa.schema([
    pa.field('predictions', pa.int32())
])
model = wl.upload_model('sklearn-clustering-kmeans', 
                        "models/model-auto-conversion_sklearn_kmeans.pkl", 
                        framework=Framework.SKLEARN, 
                        input_schema=input_schema, 
                        output_schema=output_schema,
                       )
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a native runtime..
Model is attempting loading to a native runtime..incompatible
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime..............successful