4. Run Model Inference


Self-Service Tutorial Contents

  1. Package Model
  2. Deploy Model
  3. Scale Model Up
  4. :arrow-right: Run Model Inference
  5. Set Drift Baseline
  6. Deploy Model to Edge Device

Environment setup

In this tutorial, we will explore two different ways to submit inference to our newly-deployed model that we have also scaled up. In the first method, you will submit an inference within Modzy's user interface. Then, we will learn how to interact with the model via Modzy's Python SDK.

To follow along, you must have completed Tutorial #2 at a minimum, though it is recommended that you complete Tutorials 1-3 before continuing.


What you'll need for this tutorial

  • Your newly-deployed and scaled up model
  • A valid Modzy account
  • Your local Python environment with Modzy's SDK installed (Python >= 3.6)
    • We recommend following this tutorial in a Jupyter notebook, but any IDE will work

First, you will need to set up your Python environment if you have not already done so. Create a virtual environment (venv, conda, or your virtual environment of choice), and install Jupyter Notebooks using the appropriate install instructions.

Then, use pip to install the Modzy SDK Python Package.

pip install modzy-sdk

For now, you are all set. We will open a Jupyter notebook later.

Take your model for a test drive

We pick up right where we left off in our last tutorial, on our model details page. Begin by clicking the "Test Drive" tab from the navigation bar on the left side of the page.

Test Drive Feature

Test Drive Feature

This interface provides a wrapper around Modzy's Job API, which makes it convenient to test how models work. In the input.txt field, type any text you'd like. We will go with "This is my first time ever using Modzy!".

When you are ready to send it to the model, click the "Start Job" button. Your screen should then look something like this:

Test Drive Job Submitted

Test Drive Job Submitted

After a few seconds, view the results to see what you can expect this model's outputs to look like.

Model Outputs

Model Outputs

Congratulations! You just successfully ran your first model inference. Continue to the next section to learn how to do the same via API interaction using Modzy's Python client.

Run model in Python

Now, it is time to interact with your model programmatically. To do so, navigate to your Python environment within your terminal and open a Jupyter notebook kernel.

jupyter notebook

Next, navigate back to your model details page in your browser and select the "API" tab within the panel on the left.

The dropdown under "Sample Request" provides ready-to-use copy-pastable code suited to your programming language of choice. Select "Python" from the list. Then, hover over the "Copy" button on the top right of this snippet and click to copy this code to your clipboard.

Python Sample Request

Python Sample Request

Jump back over to your Jupyter notebook, and paste this code into the first cell. For usability purposes, we split the code into three separate cells, but you can run it all in one if you prefer.

Now, before you execute this code, we will make two changes:

  1. You must insert your own API key
  2. We will change the placholder "Lorem ipsum ... " text to something more relevant to this model

Download API Key

If you have not already done so, download your API key by first navigating to to the "API Keys" page within your profile.

Administrative Drop Down Menu

Administrative Drop Down Menu

This will take you to your profile page, where you can view your job usage, API keys segmented by teams and projects, and your profile settings. Scroll to the key for the team you are working on and click the "Download Key" button.

API Key Download

API Key Download


Full API Key in text file download

This API key is required to interact with Modzy's APIs (including the Python client we are using). You must locate the text file downloaded to your machine and use the full API key contained within this file when submitting API calls.

Learn more about API keys here.

Now, copy the full API key from the text file downloaded to your machine and paste it into your Jupyter notebook. The full key will look something like this: 5r1gITGRSO5HfvbETreL.3vBC0ASxBmfEaBT8BPLa.

API Key Inserted into Code

API Key Inserted into Code

Change text and run code

Next, change the text within the sources dictionary to something other than the placeholder text. We will insert this sentence: "Modzy's Python API client is super cool!"

Finally, it is time to run our code. Execute these three cells. The output should look similar to this:

Job output

Job output

Now, to verify this job ran successfully, navigate to the "Usage" tab in your profile, where you should see a row entry under the "Your Jobs" section representing the job you just submitted.

Model Usage and Personal Job History

Model Usage and Personal Job History

Click on the Job ID (first column) to view more detailed results, including latency metrics, timestamps, and the raw results.

Congratulations! You just ran your first job in Python. Move onto the last section of this tutorial to scale up one step further and run a larger job with a batch of inputs.

Run batch job in Python

In the previous example, we ran a job with a single input as defined in the sources dictionary. Now, we will follow a similar process but instead submit a batch of data to our model.

To do so, you will need a dataset to generate a batch input. We will use a modified version of Kaggle's Amazon reviews dataset. Download the formatted text file using wget directly in your notebook:

!python3 -m pip install wget
!python3 -m wget https://raw.githubusercontent.com/modzy/modzy-jupyter-notebook-samples/main/python-sdk-inference/test.ft.txt

With this dataset downloaded, we will follow a similar workflow to that of running a single inference in Python. In the next cell of your notebook, build your batch input source with the following code:

with open("test.ft.txt", "r", encoding="utf-8") as f:
    text = f.readlines()[:50] # extract first 50 reviews

# clean reviews before feeding to model
text_cleaned = [t.split("__label__")[-1][2:].replace("\n", "") for t in text]
sources = {"review_{}".format(i): {"input.txt": review} for i, review in enumerate(text_cleaned)}

Now, we can simply submit a batch job with our new set of input data:

batch_job = client.jobs.submit_text("g0h96fgwjq", "1.0.0", sources)
print(f"job: {batch_job}")

Similar to before, your output should look something like this:

Batch Job Output

Batch Job Output

Just as you did before, navigate back to the "Usage" tab in your profile, where you should see a row entry under the "Your Jobs" section representing the batch job you just submitted.

Click into the latest job, where you will notice a few differences from our last inference:

  • First, you will notice the number of completed items is equal to the size of our batch input, or 50 Amazon reviews
  • As you scroll down, you will also notice the result for each input in your batch listed
Batch Inference Completion

Batch Inference Completion

Batch Inference Results

Batch Inference Results

Now that you have deployed your model, scaled it up, and ran several inferences against it, move onto the next tutorial to learn how to configure drift settings.