3. Scale Model Up
Self-Service Tutorial Contents
Tutorial Preparation
In the next tutorial of this series, we will learn how to scale a model up and prepare it to run production inferences. To follow along, you must have followed Tutorial #2 and deployed a model to your Modzy model library.
What you'll need for this tutorial
- A valid Modzy account
- Your newly-deployed model
We will kick off this tutorial where we wrapped up the last - on our newly-deployed model page. Your model page should look something like the below image.
Newly-deployed Model Page
Acclimate yourself with the different tabs on the left panel of your model page and feel free to select the "Edit Model" button under the "Actions" list to add documentation as desired.
More Information
Throughout this tutorial, we will use the term "Processing Engine (PE)" frequently. Learn more about processing engines, what they are, and how they are used, here.
Navigate to model management
First, scroll down on your model page until you can see the "Model Management" option. Click this button.
Model Actions Tab
This will bring you to a page where you can manage the PE allocation to different models. In the search bar, type "hugging" to filter for your model.
Model Management Search Page
Set min/max PE values
Now, to spin up our model, hover over the "Engine Autoscaling" column that should read "0 min / 1 max" and click on this. Then set the minimum value to 1.
Set Engine Autoscaling
Click "Save". After a few seconds, you should see the "Engine Status" change from "Stopped" to "1 Spinning up".
Model Spinning Up
Modzy is now pulling a piece of infrastructure that meets your model hardware configuration and spinning up the model on that hardware. Doing so will result in faster inference times.
Autoscaling
It is worth noting that we are spinning up our model so we can run faster inferences. However, it is not required to run models. If this minimum is left as 0, our autoscaler will handle spinning up the model, running inference, and shutting down the model. Learn more about the benefits and tradeoffs of this feature here.
When the status changes again from "1 Spinning up" to "1 Ready", click on the model name again, which will take you back to your model home page. This time, however, you will notice the Engine status on the top right of the page reads "Ready".
Model Details Page - "Ready" Engine Status
Your model is now successfully scaled up, which means all model initialization has been taken care of and it is ready to run inferences. Check out how to do so in the next tutorial!
Updated 6 months ago