Self-Service Tutorial Contents
In the next tutorial of this series, we will learn how to scale a model up and prepare it to run production inferences. To follow along, you must have followed Tutorial #2 and deployed a model to your Modzy model library.
What you'll need for this tutorial
- A valid Modzy account
- Your newly-deployed model
We will kick off this tutorial where we wrapped up the last - on our newly-deployed model page. Your model page should look something like the below image.
Acclimate yourself with the different tabs on the left panel of your model page and feel free to select the "Edit Model" button under the "Actions" list to add documentation as desired.
Throughout this tutorial, we will use the term "Processing Engine (PE)" frequently. Learn more about processing engines, what they are, and how they are used, here.
First, scroll down on your model page until you can see the "Model Management" option. Click this button.
This will bring you to a page where you can manage the PE allocation to different models. In the search bar, type "hugging" to filter for your model.
Now, to spin up our model, hover over the "Engine Autoscaling" column that should read "0 min / 1 max" and click on this. Then set the minimum value to 1.
Click "Save". After a few seconds, you should see the "Engine Status" change from "Stopped" to "1 Spinning up".
Modzy is now pulling a piece of infrastructure that meets your model hardware configuration and spinning up the model on that hardware. Doing so will result in faster inference times.
It is worth noting that we are spinning up our model so we can run faster inferences. However, it is not required to run models. If this minimum is left as 0, our autoscaler will handle spinning up the model, running inference, and shutting down the model. Learn more about the benefits and tradeoffs of this feature here.
When the status changes again from "1 Spinning up" to "1 Ready", click on the model name again, which will take you back to your model home page. This time, however, you will notice the Engine status on the top right of the page reads "Ready".
Your model is now successfully scaled up, which means all model initialization has been taken care of and it is ready to run inferences. Check out how to do so in the next tutorial!
Updated 11 months ago