GuidesRecipesAPI ReferenceChangelogDiscussions
Log In

Github Actions

Deploy your model in a completely automated CI/CD pipeline

This page demonstrates how to set up a Continuous Integration (CI) and Continuous Delivery (CD) pipeline with GitHub Actions to automate the deployment of machine learning models from a GitHub repository to Modzy.

The objective of this integration is to give data scientists a mechanism to train their model(s) in their preferred workspace, configure a single JSON file, and simply commit their changes to the main branch in their GitHub repository. Doing so will trigger the CI/CD pipeline, which will leverage artifacts from the repository along with Chassis & Modzy SDK python code to automatically containerize and deploy the trained model to Modzy. Please note that the implementation of the GitHub Actions Workflow in this guide can be modified and set up several different ways, so this is a great place to start if you are interested in creating your own CI/CD pipeline!

📘

What you will need

  • Valid Modzy Credentials (instance URL and API Key, e.g., https://trial.app.modzy.com and q4jp1pOZyFTddkFsOYwI.flHw34veJgfKu2MNzAa7)
  • GitHub account

If you follow this guide step by step, reference this repository to see a full working example.

Training model and committing code (Data Scientist)

The main goal of setting up a CI/CD model deployment pipeline is to make data scientist's lives easier. So as a data scientist, there is no need to change your normal model development workflow! In this example, we will simply expect the data scientist to complete a model_info.json file to define a few pieces of information we will need to execute the CI/CD deployment to Modzy.

Train your model

In this example, we will train a basic digits classification model using a support vector machine from the Scikit-learn machine learning framework.

import json
import numpy as np
import pickle
from sklearn.linear_model import LogisticRegression
from sklearn import datasets

# Import and normalize data
X_digits, y_digits = datasets.load_digits(return_X_y=True)
X_digits = X_digits / X_digits.max()

n_samples = len(X_digits)

# Split data into training and test sets
X_train = X_digits[: int(0.9 * n_samples)]
y_train = y_digits[: int(0.9 * n_samples)]
X_test = X_digits[int(0.9 * n_samples) :]
y_test = y_digits[int(0.9 * n_samples) :]

# Train Model
logistic = LogisticRegression(max_iter=1000)
print(
    "LogisticRegression mean accuracy score: %f"
    % logistic.fit(X_train, y_train).score(X_test, y_test)
)

# save trained model
with open("weights/model_latest.pkl", "wb") as file:
    pickle.dump(logistic, file)

# Save small sample input to use for testing later
sample = X_test[:5].tolist()
with open("data/digits_sample.json", 'w') as out:
    json.dump(sample, out)

Notice we save our model as a pickled weights file in the weights/ directory. Additionally, we save a sample piece of input data for testing in the data/ directory. The file paths to these artifacts are important for the next step.

Define model information

Next, document a few pieces of information about your model in the model_info.json file:

  • name: Desired name for your model when it is deployed to Modzy
  • version: Version of your model to deploy. Note: you can deploy as many versions of the same model to Modzy as you wish
  • weightsFilePath: File path that should point to your most up-to-date weights file
  • sampleDataFilePath: File path that should point to a sample data file that can be used to test your model during the CI/CD process.

Example model_info.json:

{
    "name": "GitHub Action Example",
    "version": "0.0.1",
    "weightsFilePath": "weights/model_latest.pkl",
    "sampleDataFilePath": "data/digits_sample.json"
}

It is important that the weightsFilePath and sampleDataFilePath keys accurately reflect the artifacts you wish to use to deploy your model. Once this looks the way you would like it to, simply commit your changes to the main branch in your repository, and the CI/CD process will do the rest.

Setting up GitHub Action Workflow (DevOps, MLE)

To set up the GitHub Action workflow in the data scientist's repository, you will need to define your workflow in a .yaml file and also set a few GitHub encrypted secrets that are accessed in the workflow script.

Define workflow

First, create a yaml file to configure the GitHub workflow: .github/workflow/ci.yml.

Next, paste the following into your yaml file as a template starter:

name: Build

on:
  push:
    paths:
      - 'model_info.json'

jobs:
  package:
    runs-on: ubuntu-latest
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      - 
        name: Python setup 
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
          cache: 'pip'
      - 
        name: Install python dependencies 
        run: pip install -r requirements.txt
      -
        name: Run Chassis code to prepare context
        run: python package.py
      -
        name: Set up QEMU
        uses: docker/setup-qemu-action@v3
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      -
        name: Login to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}      
      -
        name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: ./build
          push: true
          tags: ghcr.io/${{ github.repository }}:latest

  deploy:
    needs: [package]
    runs-on: ubuntu-latest
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      - 
        name: Python setup 
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
          cache: 'pip'
      - 
        name: Install python dependencies 
        run: pip install modzy-sdk>=0.11.6
      -
        name: Deploy model container to Modzy 
        env: 
          CONTAINER: ghcr.io/${{ github.repository }}:latest
          MODZY_URL: ${{ secrets.MODZY_URL }}
          MODZY_API_KEY: ${{ secrets.MODZY_API_KEY }}
        run: python deploy.py

If you are following the example template, then you do not need to make any changes to this script. However, if your data scientist is building a different model, the only file in the repository you will need to change is the package.py file.

This python file provides the required Chassis code needed to containerize your Python model. View the guides on the Chassis docs site to learn more. In summary, you will need to modify the predict function based on the model(s) you are managing and building in your code repository.

Set up GitHub secrets

The last thing you need to configure to ensure the ci.yml file will run when triggered are GitHub secrets. To do so, navigate to the Settings tab within your repository, and click on Secrets :arrow-forward: Actions.

420

Figure 1. GitHub secrets


On the top right of the page, select "New Repository Secret". In the "Name" field, past the value in the "Name" column in the table below. The "Value Description" column in the below table describes what the value should look like with corresponding examples. Create a new repository secret for each row in this table.

NameValue DescriptionExample
MODZY_URLValid Modzy instance URLhttps://trial.app.modzy.com
MODZY_API_KEYValid Modzy API key associated with MODZY_URL instance. Note: this API key must be associated with a user that has the "Data Scientist" roleq4jp1pOZyFTddkFsOYwI.flHw34veJgfKu2MNzAa7

In your ci.yml file, you will notice the following snippet that accesses these secrets and exports them to environment variables. These environment variables are referenced in the Chassis code that executes the automatic containerization and deployment of this model to Modzy.

env: 
	CONTAINER: ghcr.io/${{ github.repository }}:latest
  MODZY_URL: ${{ secrets.MODZY_URL }}
  MODZY_API_KEY: ${{ secrets.MODZY_API_KEY }}

Congratulations! You can now deploy models to Modzy automatically through your CI/CD pipeline. Integrating CI/CD into ModelOps improves process efficiencies, leverages DevOps best practices, and most importantly, gives data scientists the mechanisms required to push machine learning into production, without having to write a single line of production code.


What’s Next

Check out more CI/CD examples to improve your MLOps pipeline!