Wallaroo SDK Essentials Guide: Inferencing

How to use Wallaroo for model inferencing through the Wallaroo SDK

Inferencing

Once a pipeline has been deployed, an inference can be run. This will submit data to the pipeline, where it is processed through each of the pipeline’s steps, with the output of the previous step providing the input for the new step. The final step will then output the result of all of the pipeline’s steps.

The input sent and the output received depends on whether Arrow support is enabled in the Wallaroo instance.

  • For Arrow enabled instances of Wallaroo:
  • For Arrow disabled instances of Wallaroo:
    • Inputs are submitted in the proprietary Wallaroo JSON format.
    • Outputs are inferences are returned as the InferenceResult object.

Run Inference through Local Variable

Arrow Enabled Infer

When Arrow support is enabled, the method pipeline infer(data, timeout, dataset, dataset_exclude, dataset_separator) performs an inference as defined by the pipeline steps and takes the following arguments:

  • data (REQUIRED): The data submitted to the pipeline for inference. Inputs are either sent as a pandas.DataFrame or an Apache Arrow.
  • timeout (OPTIONAL): A timeout in seconds before the inference throws an exception. The default is 15 second per call to accommodate large, complex models. Note that for a batch inference, this is per call - with 10 inference requests, each would have a default timeout of 15 seconds.
  • dataset (OPTIONAL): The datasets to be returned. By default this is set to ["*"] which returns, [“time”, “in”, “out”, “check_failures”].
  • dataset_exclude (OPTIONAL): Allows users to exclude parts of the dataset.
  • dataset_separator (OPTIONAL): Allows other types of dataset separators to be used. If set to “.”, the returned dataset will be flattened.

The following example is an inference request using a pandas DataFrame, and the returning values. Note that columns are labeled based on the inputs and outputs. This model only has one output - dense_1, which is listed in the out.dense_1 column. If the model had returned multiple outputs, they would be listed as out.output1, out.output2, etc.

result = ccfraud_pipeline.infer(high_fraud_data)
  time in.tensor out.dense_1 check_failures
0 2023-02-15 23:07:07.570 [1.0678324729, 18.1555563975, -1.6589551058, 5.2111788045, 2.3452470645, 10.4670835778, 5.0925820522, 12.8295153637, 4.9536770468, 2.3934736228, 23.912131818, 1.759956831, 0.8561037518, 1.1656456469, 0.5395988814, 0.7784221343, 6.7580610727, 3.9274118477, 12.4621782767, 12.3075382165, 13.7879519066, 1.4588397512, 3.6818346868, 1.753914366, 8.4843550037, 14.6454097667, 26.8523774363, 2.7165292377, 3.0611957069] [0.981199] 0

Arrow Disabled Infer

When Arrow support is disabled, the method pipeline infer(data, timeout) performs an inference as defined by the pipeline steps and takes the following arguments:

  • data (REQUIRED): The data submitted to the pipeline for inference in the Wallaroo JSON format.
  • timeout (OPTIONAL): A timeout in seconds before the inference throws an exception. The default is 15 second per call to accommodate large, complex models. Note that for a batch inference, this is per call - with 10 inference requests, each would have a default timeout of 15 seconds.

The following example shows running an inference on data with a timeout of 20 seconds:

inferences = deployment.infer(high_fraud_data, timeout=20)
display(inferences)
[InferenceResult({'check_failures': [],
  'elapsed': 144533,
  'model_name': 'ktoqccfraudmodel',
  'model_version': '08daeb67-e4c4-42a9-84af-bac7bcc9e18b',
  'original_data': {'tensor': [[1.0678324729342086,
                                18.155556397512136,
                                -1.658955105843852,
                                5.2111788045436445,
                                2.345247064454334,
                                10.467083577773014,
                                5.0925820522419745,
                                12.82951536371218,
                                4.953677046849403,
                                2.3934736228338225,
                                23.912131817957253,
                                1.7599568310350209,
                                0.8561037518143335,
                                1.1656456468728569,
                                0.5395988813934498,
                                0.7784221343010385,
                                6.75806107274245,
                                3.927411847659908,
                                12.462178276650056,
                                12.307538216518656,
                                13.787951906620115,
                                1.4588397511627804,
                                3.681834686805714,
                                1.753914366037974,
                                8.484355003656184,
                                14.6454097666836,
                                26.852377436250144,
                                2.716529237720336,
                                3.061195706890285]]},
  'outputs': [{'Float': {'data': [0.9811990261077881],
                         'dim': [1, 1],
                         'dtype': 'Float',
                         'v': 1}}],
  'pipeline_name': 'ktoqccfraudpipeline',
  'shadow_data': {},
  'time': 1676502293246})]

Run Inference through Pipeline Deployment URL

The method pipeline _deployment._url() provides a URL where information can be submitted through HTTP POST in JSON format to the pipeline to perform an inference. This is useful in providing a resource where information can be submitted to the pipeline from different sources to the same pipeline remotely.

  • For Arrow enabled instances of Wallaroo:
    • For DataFrame formatted JSON, the Content-Type is application/json; format=pandas-records.
    • For Arrow binary files, the Content-Type is application/vnd.apache.arrow.file.
  • For Arrow disabled instances of Wallaroo:
    • The Content-Type is application/json.

In this example, the aloha_pipeline’s deployment URL will be determined in an Arrow enabled Wallaroo instance. An inference will then be made on data submitted to the aloha_pipeline through its deployment URL via a curl HTTP POST command.

  • IMPORTANT NOTE: The _deployment._url() method will return an internal URL when using Python commands from within the Wallaroo instance - for example, the Wallaroo JupyterHub service. When connecting via an external connection, _deployment._url() returns an external URL. External URL connections requires the authentication be included in the HTTP request, and that Model Endpoints Guide external endpoints are enabled in the Wallaroo configuration options.
aloha_pipeline._deployment._url()

'http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo'
!curl -X POST http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo -H "application/json; format=pandas-records" --data @data-25k.json > curl_response.txt

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.9M  100 10.1M  100 2886k   539k   149k  0:00:19  0:00:19 --:--:-- 2570k

Run Inference From A File

To submit a data file directly to a pipeline, use the pipeline infer_from_file({Data File}, timeout).

Arrow Enabled infer_from_file

When Arrow support is enabled, the method pipeline infer_from_file(data, timeout, dataset, dataset_exclude, dataset_separator) performs an inference as defined by the pipeline steps and takes the following arguments:

  • data (REQUIRED): The name of the file submitted to the pipeline for inference. Inputs are either sent as a pandas.DataFrame or an Apache Arrow.
  • timeout (OPTIONAL): A timeout in seconds before the inference throws an exception. The default is 15 second per call to accommodate large, complex models. Note that for a batch inference, this is per call - with 10 inference requests, each would have a default timeout of 15 seconds.
  • dataset (OPTIONAL): The datasets to be returned. By default this is set to ["*"] which returns, [“time”, “in”, “out”, “check_failures”].
  • dataset_exclude (OPTIONAL): Allows users to exclude parts of the dataset.
  • dataset_separator (OPTIONAL): Allows other types of dataset separators to be used. If set to “.”, the returned dataset will be flattened.

In this example, an inference will be submitted to the ccfraud_pipeline with the file smoke_test.df.json, a DataFrame formatted JSON file.

result = ccfraud_pipeline.infer_from_file('./data/smoke_test.df.json')
  time in.tensor out.dense_1 check_failures
0 2023-02-15 23:07:07.497 [1.0678324729, 0.2177810266, -1.7115145262, 0.682285721, 1.0138553067, -0.4335000013, 0.7395859437, -0.2882839595, -0.447262688, 0.5146124988, 0.3791316964, 0.5190619748, -0.4904593222, 1.1656456469, -0.9776307444, -0.6322198963, -0.6891477694, 0.1783317857, 0.1397992467, -0.3554220649, 0.4394217877, 1.4588397512, -0.3886829615, 0.4353492889, 1.7420053483, -0.4434654615, -0.1515747891, -0.2668451725, -1.4549617756] [0.0014974177] 0

Arrow Disabled infer_from_file

When Arrow support is not enabled, the method pipeline `infer_from_file(filename, timeout) performs an inference as defined by the pipeline steps and takes the following arguments:

  • {Data File} : {Data File} is the path name to the submitted file in the Wallaroo JSON format.
  • timeout: A timeout in seconds before the inference throws an exception. The default is 15 second per call to accommodate large, complex models. Note that for a batch inference, this is per call - with 10 inference requests, each would have a default timeout of 15 seconds.

In this example, an inference will be submitted to the aloha_pipeline with the file data-1.json with a timeout of 20 seconds:

aloha_pipeline.infer_from_file("data-1.json", timeout=20)

[InferenceResult({'check_failures': [],
  'elapsed': 329803334,
  'model_name': 'aloha-2',
  'model_version': '3dc9b7f9-faff-40cc-b1b6-7724edf11b12',
  'original_data': {'text_input': [[0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    28,
                                    16,
                                    32,
                                    23,
                                    29,
                                    32,
                                    30,
                                    19,
                                    26,
                                    17]]},
  'outputs': [{'Float': {'data': [0.001519620418548584], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.9829147458076477], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.012099534273147583], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [4.7593468480044976e-05],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [2.0289742678869516e-05],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [0.0003197789192199707],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [0.011029303073883057], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.9975639581680298], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.010341644287109375], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.008038878440856934], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.016155093908309937], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.006236225366592407], 'dim': [1, 1], 'v': 1}},
              {'Float': {'data': [0.0009985864162445068],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [1.7933435344117743e-26],
                         'dim': [1, 1],
                         'v': 1}},
              {'Float': {'data': [1.388984431455466e-27],
                         'dim': [1, 1],
                         'v': 1}}],
  'pipeline_name': 'aloha-test-demo',
  'time': 1648744282452})]

InferenceResult Object

The inferences from infer and infer_from_file return the following:

  • Arrow support enabled Wallaroo instances return either:
    • If the input is a pandas DataFrame, then the inference result is a pandas DataFrame.
    • If the input is a or an Apache Arrow, then the inference result is an Apache Arrow.
  • Arrow support is not enabled Wallaroo instances return the the Wallaroo InferenceResult object.

The InferenceResult object includes the following methods:

  • data() : The data resulting from the inference.

    In this example, an inference will be submitted to the ccfraud_pipeline with the file cc_data_1k.json, with only the data displayed:

    output = ccfraud_pipeline.infer_from_file('./cc_data_1k.json')
    output[0].data()
    
    [array([[9.93003249e-01],
          [9.93003249e-01],
          [9.93003249e-01],
          ...,
          [1.10703707e-03],
          [8.53300095e-04],
          [1.24984980e-03]])]
    
  • input_data(): Returns the data provided to the pipeline to run the inference.

    In this example, an inference will be submitted to the ccfraud_pipeline with the file cc_data_1k.json, with only the first element in the array returned:

    output = ccfraud_pipeline.infer_from_file('./cc_data_1k.json')
    
    output[0].input_data()["tensor"][0]
    
    [-1.060329750089797,
    2.354496709462385,
    -3.563878832646437,
    5.138734892618555,
    -1.23084570186641,
    -0.7687824607744093,
    -3.588122810891446,
    1.888083766259287,
    -3.2789674273886593,
    -3.956325455353324,
    4.099343911805088,
    -5.653917639476211,
    -0.8775733373342495,
    -9.131571191990632,
    -0.6093537872620682,
    -3.748027677256424,
    -5.030912501659983,
    -0.8748149525506821,
    1.9870535692026476,
    0.7005485718467245,
    0.9204422758154284,
    -0.10414918089758483,
    0.3229564351284999,
    -0.7418141656910608,
    0.03841201586730117,
    1.099343914614657,
    1.2603409755785089,
    -0.14662447391576958,
    -1.446321243938815]