Container specifications gRPC

Modzy requests the model’s status, runs data, and gets output results from a model container. When needed, it can request to shutdown a container. The container specifications is a service that can respond to these requests. Here we provide a sample specification that uses gRPC over HTTP/2 with protocol buffers (language neutral) to send data, for your reference.

📘

We currently support Docker containers. Support for all OCI-compliant containers is coming soon.

Requirements

Container

The Docker container must expose an HTTP/2 service on the port specified by the PSC_MODEL_PORT environment variable that implements the Status, Run, and Shutdown routes detailed below.

Entry Point

Add the commands in the Dockerfile to access the container.

In your Dockerfile:

  • add a Python version manager:
RUN git clone --depth=1 https://github.com/pyenv/pyenv.git /.pyenv && \
    pyenv install ${PYTHON_VERSION} && \
    pyenv global ${PYTHON_VERSION}
  • define the entry point:
WORKDIR /opt/app
COPY grpc_model ./grpc_model
COPY model_lib ./model_lib
  • set the package managers:
ADD pyproject.toml poetry.lock ./
RUN pip install --no-cache-dir --upgrade pip && \
    pip install poetry && \
    poetry install --no-dev

CMD ["poetry", "run", "python", "grpc_model/src/model_server.py"]

Inputs and outputs

Inputs hold the input-items (or content) sent to the model to be processed. Outputs hold the results returned by the model. Specify the input and output item names to link the model to input-items and results. Add the input-items as bytes to process the content.

Each model defines a filename for the input and output items. The filenames of the input-items sent must match the model’s input-item names.

For example, a sentiment analysis model defines its inputs and outputs as follows:

  • the input-item has one data-item: input.txt,
  • the output is named results.json.

In this case, the input is:

Input items contain a key that should match the model input name and content stored in memory on the client-side.

input:
        input.txt: 0010010101000111001101101100111001 ...

And when the results are available, the output contains:

output:
        results.json: 000001000101010000100 ...
      success: true

If success is false there will be an "error" key in the outputMap with as much information as possible:

output:
        error: 0000000011100110010 ... <- The error message bytes encoded in UTF-8
      success: false

Check out our run a model tutorial for more details.

gRPC specifications

This proto file contains all the specifications:

syntax = "proto3";

option java_package = "com.modzy.model.grpc";
option java_multiple_files = true;

service ModzyModel {
  rpc Status(StatusRequest) returns (StatusResponse);
  rpc Run(RunRequest) returns (RunResponse);
  rpc Shutdown(ShutdownRequest) returns (ShutdownResponse);
}


message StatusRequest {
  // Keep empty bc compatibility in future if we add something specific for this call
}

message ModelInfo {
    string model_name    = 1;
    string model_version = 2;
    string model_author  = 3;
    string model_type    = 4;    
    string source        = 5;
}

message ModelDescription {
    string summary                     = 1;
    string details                     = 2;
    string technical                   = 3;
    string performance                 = 4;
}

message ModelInput {
  string filename                      = 1;
  repeated string accepted_media_types = 2;
  string max_size                      = 3;
  string description                   = 4;
}

message ModelOutput {
  string filename                      = 1;
  string media_type                    = 2;
  string max_size                      = 3;
  string description                   = 4;
}

message ModelResources {
  string required_ram           = 1;
  float num_cpus                = 2;
  int32 num_gpus                = 3;
}

message ModelTimeout {
  string status                 = 1;
  string run                    = 2;
}

message ModelFeatures {
  bool adversarial_defense      = 1;
  int32 batch_size              = 2;
  bool retrainable              = 3;
  string results_format           = 4;
  string drift_format             = 5;
  string explanation_format       = 6;
}

message StatusResponse {
  int32 status_code              = 1;
  string status                  = 2;
  string message                 = 3;
  ModelInfo model_info           = 4;
  ModelDescription description   = 5;
  repeated ModelInput inputs     = 6;
  repeated ModelOutput outputs   = 7;
  ModelResources resources       = 8;
  ModelTimeout timeout           = 9;
  ModelFeatures features         = 10;
}

message InputItem {
  map<string, bytes> input      = 1;
}

message RunRequest {
  repeated InputItem inputs     = 1;
  bool detect_drift             = 2;
  bool explain                  = 3;
}

message OutputItem {
  map<string, bytes> output     = 1;
  // If success is false there will be an "error" key in the outputMap with as much information as possible
  bool success                  = 2;
}

message RunResponse {
  int32 status_code             = 1;
  string status                 = 2;
  string message                = 3;
  repeated OutputItem outputs   = 4;
}

message ShutdownRequest {
  // Keep empty bc compatibility in future if we add something specific for this call
}

message ShutdownResponse {
  int32 status_code             = 1;
  string status                 = 2;
  string message                = 3;
}

All routes should respond with this format:

int32 status_code             = 1;
string status                 = 2;
string message                = 3;

Ensure the message provides useful feedback about model errors. It’s a good practice to return inference error details. The sample below returns a description for each input that fails, an overall message, status, and status code:

{
  "errors": [
    {
      "input-item": "Error in the second input."
    }
  ],
  "message": "Success with errors.",
  "status": "OK",
  "statusCode": 200
}

Status

rpc Status(StatusRequest) returns (StatusResponse);

Initializes the model and returns its status.

Request

message StatusRequest {
  //Keep empty
}

Response

message StatusResponse {
  int32 status_code              = 1;
  string status                  = 2;
  string message                 = 3;
  ModelInfo model_info           = 4;
  ModelDescription description   = 5;
  repeated ModelInput inputs     = 6;
  repeated ModelOutput outputs   = 7;
  ModelResources resources       = 8;
  ModelTimeout timeout           = 9;
  ModelFeatures features         = 10;
}

Status codes

Status 200

The model is ready to run.

Status 500

Unexpected error loading the model.

Batch processing

To set a model for batch processing, the Status service should return a batch processing size under ModelFeatures object. A model’s batch_size is the maximum amount of input items it can process simultaneously while mounted on a GPU.

message ModelFeatures {
  bool adversarial_defense      = 1;
  int32 batch_size              = 2;
  bool retrainable              = 3;
  string results_format           = 4;
  string drift_format             = 5;
  string explanation_format       = 6;
}

Explainability

To set a model for explainability, the Status service should return an explainable boolean under ModelFeatures object.

message ModelFeatures {
  bool adversarial_defense      = 1;
  int32 batch_size              = 2;
  bool retrainable              = 3;
  string results_format           = 4;
  string drift_format             = 5;
  string explanation_format       = 6;
}

Run

rpc Run(RunRequest) returns (RunResponse);

Runs the model inference on a given input.

Request

Add the job configuration object with protocol buffers:

message RunRequest {
  repeated InputItem inputs     = 1;
  bool detect_drift             = 2;
  bool explain                  = 3;
}

Parameters

InputItem inputs

An array of inputs as described above.

explain optional

Sets the explainability feature when a model offers the option.

Response

message RunResponse {
  int32 status_code             = 1;
  string status                 = 2;
  string message                = 3;
  repeated OutputItem outputs   = 4;
}

Status codes

Status 200

Successful inference.

Status 422

Unprocessable input file.
The model cannot run inference on the provided input files (for example an input file may be the wrong format, too large, too small, etc).
The response message should contain a detailed validation error that explains why the model cannot process a given input file.

Status 500

Unexpected error running the model.

Explainability

Output files contain inference results. Models with built-in explainability return output files with this structure:

{
  "modelType": "",
  "result": {
    "classPredictions": []
  },
  "explanation": {
    "maskRLE": []
  }
}
{
  "modelType": "textClassification",
  "result": {
    "classPredictions": []
  },
  "explanation": {
    "wordImportances": {},
    "explainableText": {}
  }
}

Image classification models explainability object

Parameter

Type

Description

modelType

string

Defines the explanation format. Possible options: imageClassification, imageSegmentation, objectDetection.

result

object

Contains the results in a classPredictions array.

explanation

object

Contains a maskRLE array with the explanation and a dimensions object with the height and width pixels. The maskRLE follows a column-major order (Fortran order).

Text classification models explainability object

Parameter

Type

Description

modelType

string

Defines the explanation format. Possible Options: textClassification.

result

object

Contains the results in a classPredictions list that consists of a prediction and score for each class.

explanation

object

Contains:

wordImportances key/value pair that consists of a list that includes the word, score, and optional index of the word in the original text for each class.
If a score is negative, it means the word contributed negatively to that class prediction.

optional explainableText key/value pair that consists of a list that includes the word, score, and optional index of the word in the preprocessed text for each class.

Shutdown

rpc Shutdown(ShutdownRequest) returns (ShutdownResponse);

The model server process should exit with exit code 0.

Request

message ShutdownRequest {
  //Keep empty
}

Response

The model server is not required to send a response and may simply drop the connection; however, a response is encouraged.

message ShutdownResponse {
  int32 status_code             = 1;
  string status                 = 2;
  string message                = 3;
}

Status codes

Status 202

Request accepted.
The server process exits after returning the response.

Status 500

Unexpected error shutting down the model.


Did this page help you?