GuidesRecipesAPI ReferenceChangelogDiscussions
Log In

Tensorflow (Manual)

Manually package and deploy your tensorflow model

In this guide, we will prepare an example Tensorflow model for deployment to Modzy using the following resources:

The Tensorflow model is pulled directly from this Tensorflow tutorial. Note: if you are following along, replace all actions specific to the tutorial model with your own Tensorflow model.

This containerization process includes three steps:

  1. Construct Model Container
  2. Construct Metadata Configuration File
  3. Test and Validate Model Container

:construction: Construct Model Container

First, migrate your existing model library into the model_lib/src directory. In the example repository, notice the openimages_v4_ssd_mobilenet_v2_1/ directory that contains the saved Tensorflow model file(s). We will reference this directory when we move our model inference code to the Modzy Model Wrapper Class.

Navigate to the model_lib/src/ file within the repository, which contains the Modzy Model Wrapper Class. Proceed to fill out the __init__() and handle_discrete_input(), which are intended to load your model and run inference, respectively.


def __init__(self):
    This constructor should perform all initialization for your model. For example, all one-time tasks such as
    loading your model weights into memory should be performed here.
    This corresponds to the Status remote procedure call.
    self.detector = hub.load(MODEL_DIR).signatures['default']

where MODEL_DIR is defined as follows:

ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
MODEL_DIR = os.path.join(ROOT_DIR, "openimages_v4_ssd_mobilenet_v2_1")


This method is the main driver for the inference process. It will read in input data in raw bytes form, perform any required preprocessing steps, execute predictions, and finally return the output in a clean JSON format.

def handle_single_input(self, model_input: Dict[str, bytes], detect_drift: bool, explain: bool) -> Dict[str, bytes]:
    This corresponds to the Run remote procedure call for single inputs.
    # `model_input` will have binary contents for each of the input file types specified in your model.yaml file

    # You are responsible for processing these files in a manner that is specific to your model, and producing
    # inference, drift, and explainability results where appropriate.

    # process image bytes using tf libary
    img_bytes = model_input["image"]
    img =, channels=3)
    converted_img = tf.image.convert_image_dtype(img, tf.float32)[tf.newaxis, ...]
    results = self.detector(converted_img)

    # format results
    result = {key:value.numpy() for key,value in results.items()}
    inference_result = self.format_detections(result)
    explanation_result = None
    drift_result = None

    # structure outputs correctly
    output = get_success_json_structure(inference_result, explanation_result, drift_result)

    return output


Notice a helper function, format_detections(), referenced in handle_discrete_input(). This function helps format the raw detection predictions into a human-readable JSON format.

def format_detections(self,result_object):
    # parse out what we need from result_object
    class_names = result_object["detection_class_entities"]
    scores = result_object["detection_scores"]
    bboxes = result_object["detection_boxes"]

    # store formatted detections in this list
    formatted_detections = []
    for name, score, bbox in zip(class_names, scores, bboxes):
        ymin, xmin, ymax, xmax = tuple(bbox)
        detection = {}
        detection["class"] = name.decode()
        detection["score"] = round(score.item(), 3)
        detection["xmin"] = xmin.item()
        detection["ymin"] = ymin.item()
        detection["xmax"] = xmax.item()
        detection["ymax"] = ymax.item()
    formatted_results = {"detections": formatted_detections}            

    return formatted_results

Now, we will set up the Dockerfile correctly to ensure your gRPC model server can be spun up inside your Docker container. Pending your model can run on a vanilla Python base image (i.e., does not require GPU/CUDA), all you need to do is add the Python libraries your code depends on to the requirements.txt file. For this example:



  • Complete the handle_discrete_input_batch() method in order to enable custom batch processing for your model.
  • Refactor the ExampleModel class name in order to give your model a custom name.

View the example model_lib/src/ file here.

:page-with-curl: Construct Metadata Configuration File

Create a new version of your model using semantic versioning, x.x.x, and create a new directory for this version under asset bundle. Fill out a model.yaml file under asset_bundle/x.x.x/ according to the proper specification and then update the __VERSION__ = x.x.x variable located in grpc_model/ prior to performing the release for your new version of the model. Also, you must update the following line in the Dockerfile:

COPY asset_bundle/x.x.x ./asset_bundle/x.x.x/`

In your model.yaml file complete this portion at a minimum. The following information in this file is required to create a fully functional and compliant container:

inputs Example:

  # The expected media types of this file. For more information
  # on media types, see:
  # The maximum size that this file is expected to be.
  # A human readable description of what this file is expected to
  # be. This value supports content in Markdown format for including
  # rich text, links, images, etc.
  description:Image to pass through Object Detection model

outputs Example:

  # The expected media types of this file. For more information
  # on media types, see:
  mediaType: application/json
  # The maximum size that this file is expected to be.
  # A human readable description of what this file is expected to
  # be. This value supports content in Markdown format for including
  # rich text, links, images, etc.
  description: JSON file with bounding box and class for each prediction

resources Example:

    # The amount of RAM required by your model, e.g. 512M or 1G
    size: 1G
    # CPU count should be specified as the number of fractional CPUs that
    # are needed. For example, 1 == one CPU core.
    # GPU count must be an integer.
# Please specify a timeout value that indicates a time at which
# requests to your model should be canceled. If you are using a
# webserver with built in timeouts within your container such as
# gunicorn make sure to adjust those timeouts accordingly.
  # Status timeout indicates the timeout threshhold for calls to your
  # model's `/status` route, e.g. 20s
  status: 60s
  # Run timeout indicates the timeout threshhold for files submitted
  # to your model for processing, e.g. 20s
  run: 60s

# Please set the following flags to either true or false.
  recommended: false
  experimental: false
  available: true

    adversarialDefense: false
    maxBatchSize: 8
    retrainable: false
    # The following features should be modified from away from null to a supported format specification if you want
    # your model to have platform support for results, drift, or explanations.
    resultsFormat: "objectDetection"
    driftFormat: "objectDetection"
    explanationFormat: "objectDetection"

:white-check-mark: Test and Validate Model Container

Now, we will run validation tests on our model container analogously to the way the Modzy Platform will spin up the model container and run inference.

First, create a virtual environment (venv, conda, or other preferred virtual environment), activate it, and install any pip-installable packages defined in your requirements.txt file. Using venv:

python -m venv ./grpc-model
source grpc-model/bin/activate
pip install -r requirements.txt
python -m venv .\grpc-model
pip install -r requirements.txt

Next, test your gRPC server and client connection in two separate terminals. In your first terminal, kick off the gRPC server.

python -m grpc_model.src.model_server

You will see your model instantiate and begin running on port 45000. This runs the Status() remote procedure call. After your gRPC server successfully spinning up, configure the grpc_model/src/ file and run the gRPC client in a separate terminal.

python -m grpc_model.src.model_client

This will run the custom gRPC client and execute the Run() remote procedure call with the data you defined in the client script. Pending a successful inference run, you can move on to testing your model inside a newly build Docker container.

Build your container, and spin up your model inside the container:

docker build -t <container-image-name>:<tag> .
docker run --rm -it -p 45000:45000 <container-image-name>:<tag>

Then, in a separate terminal, test the containerized server from a local gRPC model client:

python -m grpc_model.src.model_client

Pending a successful local client test, you can proceed knowing your model container runs as expected.

Congratulations! You have now successfully containerized your Tensorflow model. To deploy your new model container to Modzy, follow the Import Container guide.

What’s Next