Modzy's platform makes it easy to test new versions of models, perform comparisons among production and challenger models, try out models on different target processors or hardware profiles, do shadow, canary and many other deployments.
For example, with a quick API call to Model versions, hardware requirements and statistics between two versions of the same model can be compared. This data may be combined with Telemetry to reveal the average model latency of a prediction as well as number of jobs completed and by how many users:
Because Modzy can run multiple model versions concurrently with full version control and change management restrictions, models run simultaneously on identical data for more in-depth comparisons, including accuracy, precision, recall, latency, CPU/GPU and memory requirements and many more statistical and technical performance metrics.
In another example, we run a batch of 50 inferences simultaneously through two versions of a single model, and graph the average required to generate a prediction:
You can generate this graph by following the simple stepson this recipe:
Here a batch of 50 inferences simultaneously through three identical versions of a single model, running on different sized AWS instances including a "Small" 2 CPU core, 4 GB RAM instance, a "Large" 8 CPU cores, 16 GB RAM instance, and one GPU instance (1 NVIDIA T4 GPU, 4 Cores, 16 GB Ram):
Again all the steps required to generate this can be found in this recipe:
Updated 7 months ago