OMI Container Specification
Modzy requests the model’s status, runs data, and gets output results from a model container. When needed, it can request to shutdown a container. The container specifications is a service that can respond to these requests. Here we provide a sample specification that uses gRPC over HTTP/2 with protocol buffers (language neutral) to send data, for your reference.
We currently support Docker containers. Support for all OCI-compliant containers is coming soon.
Requirements
Container
The Docker container must expose an HTTP/2 service on the port specified by the PSC_MODEL_PORT
environment variable that implements the Status
, Run
, and Shutdown
routes detailed below.
Entry Point
Add the commands in the Dockerfile to access the container.
In your Dockerfile:
- add a Python version manager:
RUN git clone --depth=1 https://github.com/pyenv/pyenv.git /.pyenv && \
pyenv install ${PYTHON_VERSION} && \
pyenv global ${PYTHON_VERSION}
- define the entry point:
WORKDIR /opt/app
COPY grpc_model ./grpc_model
COPY model_lib ./model_lib
- set the package managers:
ADD pyproject.toml poetry.lock ./
RUN pip install --no-cache-dir --upgrade pip && \
pip install poetry && \
poetry install --no-dev
CMD ["poetry", "run", "python", "grpc_model/src/model_server.py"]
Inputs and outputs
Inputs hold the input-items (or content) sent to the model to be processed. Outputs hold the results returned by the model. Specify the input and output item names to link the model to input-items and results. Add the input-items as bytes to process the content.
Each model defines a filename for the input and output items. The filenames of the input-items sent must match the model’s input-item names.
For example, a sentiment analysis model defines its inputs and outputs as follows:
- the input-item has one data-item:
input.txt
, - the output is named
results.json
.
In this case, the input is:
Input items contain a key that should match the model input name and content stored in memory on the client-side.
input:
input.txt: 0010010101000111001101101100111001 ...
And when the results are available, the output contains:
output:
results.json: 000001000101010000100 ...
success: true
If success is false there will be an "error" key in the outputMap with as much information as possible:
output:
error: 0000000011100110010 ... <- The error message bytes encoded in UTF-8
success: false
Check out our run a model tutorial for more details.
gRPC specifications
This proto file contains all the specifications:
syntax = "proto3";
option java_package = "com.modzy.model.grpc";
option java_multiple_files = true;
service ModzyModel {
rpc Status(StatusRequest) returns (StatusResponse);
rpc Run(RunRequest) returns (RunResponse);
rpc Shutdown(ShutdownRequest) returns (ShutdownResponse);
}
message StatusRequest {
// Keep empty bc compatibility in future if we add something specific for this call
}
message ModelInfo {
string model_name = 1;
string model_version = 2;
string model_author = 3;
string model_type = 4;
string source = 5;
}
message ModelDescription {
string summary = 1;
string details = 2;
string technical = 3;
string performance = 4;
}
message ModelInput {
string filename = 1;
repeated string accepted_media_types = 2;
string max_size = 3;
string description = 4;
}
message ModelOutput {
string filename = 1;
string media_type = 2;
string max_size = 3;
string description = 4;
}
message ModelResources {
string required_ram = 1;
float num_cpus = 2;
int32 num_gpus = 3;
}
message ModelTimeout {
string status = 1;
string run = 2;
}
message ModelFeatures {
bool adversarial_defense = 1;
int32 batch_size = 2;
bool retrainable = 3;
string results_format = 4;
string drift_format = 5;
string explanation_format = 6;
}
message StatusResponse {
int32 status_code = 1;
string status = 2;
string message = 3;
ModelInfo model_info = 4;
ModelDescription description = 5;
repeated ModelInput inputs = 6;
repeated ModelOutput outputs = 7;
ModelResources resources = 8;
ModelTimeout timeout = 9;
ModelFeatures features = 10;
}
message InputItem {
map<string, bytes> input = 1;
}
message RunRequest {
repeated InputItem inputs = 1;
bool detect_drift = 2;
bool explain = 3;
}
message OutputItem {
map<string, bytes> output = 1;
// If success is false there will be an "error" key in the outputMap with as much information as possible
bool success = 2;
}
message RunResponse {
int32 status_code = 1;
string status = 2;
string message = 3;
repeated OutputItem outputs = 4;
}
message ShutdownRequest {
// Keep empty bc compatibility in future if we add something specific for this call
}
message ShutdownResponse {
int32 status_code = 1;
string status = 2;
string message = 3;
}
All routes should respond with this format:
int32 status_code = 1;
string status = 2;
string message = 3;
Ensure the message provides useful feedback about model errors. It’s a good practice to return inference error details. The sample below returns a description for each input that fails, an overall message, status, and status code:
{
"errors": [
{
"input-item": "Error in the second input."
}
],
"message": "Success with errors.",
"status": "OK",
"statusCode": 200
}
Status
rpc Status(StatusRequest) returns (StatusResponse);
Initializes the model and returns its status.
Request
message StatusRequest {
//Keep empty
}
Response
message StatusResponse {
int32 status_code = 1;
string status = 2;
string message = 3;
ModelInfo model_info = 4;
ModelDescription description = 5;
repeated ModelInput inputs = 6;
repeated ModelOutput outputs = 7;
ModelResources resources = 8;
ModelTimeout timeout = 9;
ModelFeatures features = 10;
}
Status codes | |
---|---|
Status 200 | The model is ready to run. |
Status 500 | Unexpected error loading the model. |
Batch processing
To set a model for batch processing, the Status
service should return a batch processing size under ModelFeatures object. A model’s batch_size
is the maximum amount of input items it can process simultaneously while mounted on a GPU.
message ModelFeatures {
bool adversarial_defense = 1;
int32 batch_size = 2;
bool retrainable = 3;
string results_format = 4;
string drift_format = 5;
string explanation_format = 6;
}
Explainability
To set a model for explainability, the Status
service should return an explainable boolean under ModelFeatures object.
message ModelFeatures {
bool adversarial_defense = 1;
int32 batch_size = 2;
bool retrainable = 3;
string results_format = 4;
string drift_format = 5;
string explanation_format = 6;
}
Run
rpc Run(RunRequest) returns (RunResponse);
Runs the model inference on a given input.
Request
Add the job configuration object with protocol buffers:
message RunRequest {
repeated InputItem inputs = 1;
bool detect_drift = 2;
bool explain = 3;
}
Parameters | |
---|---|
InputItem inputs | An array of inputs as described above. |
explain optional | Sets the explainability feature when a model offers the option. |
Response
message RunResponse {
int32 status_code = 1;
string status = 2;
string message = 3;
repeated OutputItem outputs = 4;
}
Status codes | |
---|---|
Status 200 | Successful inference. |
Status 422 | Unprocessable input file. The model cannot run inference on the provided input files (for example an input file may be the wrong format, too large, too small, etc). The response message should contain a detailed validation error that explains why the model cannot process a given input file. |
Status 500 | Unexpected error running the model. |
Explainability
Output files contain inference results. Models with built-in explainability return output files with this structure:
{
"modelType": "",
"result": {
"classPredictions": []
},
"explanation": {
"maskRLE": []
}
}
{
"modelType": "textClassification",
"result": {
"classPredictions": []
},
"explanation": {
"wordImportances": {},
"explainableText": {}
}
}
Image classification models explainability object
Parameter | Type | Description |
---|---|---|
modelType | string | Defines the explanation format. Possible options: imageClassification , imageSegmentation , objectDetection . |
result | object | Contains the results in a classPredictions array. |
explanation | object | Contains a maskRLE array with the explanation and a dimensions object with the height and width pixels. The maskRLE follows a column-major order (Fortran order). |
Text classification models explainability object
Parameter | Type | Description |
---|---|---|
modelType | string | Defines the explanation format. Possible Options: textClassification . |
result | object | Contains the results in a classPredictions list that consists of a prediction and score for each class. |
explanation | object | Contains:wordImportances key/value pair that consists of a list that includes the word , score , and optional index of the word in the original text for each class.If a score is negative, it means the word contributed negatively to that class prediction. optional explainableText key/value pair that consists of a list that includes the word , score , and optional index of the word in the preprocessed text for each class. |
Shutdown
rpc Shutdown(ShutdownRequest) returns (ShutdownResponse);
The model server process should exit with exit code 0.
Request
message ShutdownRequest {
//Keep empty
}
Response
The model server is not required to send a response and may simply drop the connection; however, a response is encouraged.
message ShutdownResponse {
int32 status_code = 1;
string status = 2;
string message = 3;
}
Status codes | |
---|---|
Status 202 | Request accepted. The server process exits after returning the response. |
Status 500 | Unexpected error shutting down the model. |
Updated 12 months ago