3. Scale Model Up

👍

Self-Service Tutorial Contents

  1. Package Model
  2. Deploy Model
  3. :arrow-right: Scale Model Up
  4. Run Model Inference
  5. Set Drift Baseline
  6. Deploy Model to Edge Device

Tutorial Preparation

In the next tutorial of this series, we will learn how to scale a model up and prepare it to run production inferences. To follow along, you must have followed Tutorial #2 and deployed a model to your Modzy model library.

📘

What you'll need for this tutorial

  • A valid Modzy account
  • Your newly-deployed model

We will kick off this tutorial where we wrapped up the last - on our newly-deployed model page. Your model page should look something like the below image.

Newly-deployed Model Page

Newly-deployed Model Page

Acclimate yourself with the different tabs on the left panel of your model page and feel free to select the "Edit Model" button under the "Actions" list to add documentation as desired.

📘

More Information

Throughout this tutorial, we will use the term "Processing Engine (PE)" frequently. Learn more about processing engines, what they are, and how they are used, here.

Navigate to model management

First, scroll down on your model page until you can see the "Model Management" option. Click this button.

Model Actions Tab

Model Actions Tab

This will bring you to a page where you can manage the PE allocation to different models. In the search bar, type "hugging" to filter for your model.

Model Management Search Page

Model Management Search Page

Set min/max PE values

Now, to spin up our model, hover over the "Engine Autoscaling" column that should read "0 min / 1 max" and click on this. Then set the minimum value to 1.

Set Engine Autoscaling

Set Engine Autoscaling

Click "Save". After a few seconds, you should see the "Engine Status" change from "Stopped" to "1 Spinning up".

Model Spinning Up

Model Spinning Up

Modzy is now pulling a piece of infrastructure that meets your model hardware configuration and spinning up the model on that hardware. Doing so will result in faster inference times.

📘

Autoscaling

It is worth noting that we are spinning up our model so we can run faster inferences. However, it is not required to run models. If this minimum is left as 0, our autoscaler will handle spinning up the model, running inference, and shutting down the model. Learn more about the benefits and tradeoffs of this feature here.

When the status changes again from "1 Spinning up" to "1 Ready", click on the model name again, which will take you back to your model home page. This time, however, you will notice the Engine status on the top right of the page reads "Ready".

Model Details Page - "Ready" Engine Status

Model Details Page - "Ready" Engine Status

Your model is now successfully scaled up, which means all model initialization has been taken care of and it is ready to run inferences. Check out how to do so in the next tutorial!