Github Actions

Deploy your model in a completely automated CI/CD pipeline

This page demonstrates how to set up a Continuous Integration (CI) and Continuous Delivery (CD) pipeline with GitHub Actions to automate the deployment of machine learning models from a GitHub repository to Modzy.

The objective of this integration is to give data scientists a mechanism to train their model(s) in their preferred workspace, configure a single JSON file, and simply commit their changes to the main branch in their GitHub repository. Doing so will trigger the CI/CD pipeline, which will take resources from the repository and execute some Chassis python code to automatically containerize and deploy the trained model to Modzy. Please note that the implementation of the GitHub Actions Workflow in this guide can be modified and set up several different ways, so this is a great place to start if you are interested in creating your own CI/CD pipeline!

📘

What you will need

  • Valid Modzy Credentials (instance URL and API Key, e.g., https://app.modzy.com and q4jp1pOZyFTddkFsOYwI.flHw34veJgfKu2MNzAa7)
  • Dockerhub account
  • Connection to running Chassis.ml service (either from a local deployment or via publicly-hosted service)
  • GitHub account

If you follow this guide step by step, reference this repository to see a full working example.

Training model and committing code (Data Scientist)

The main goal of setting up a CI/CD model deployment pipeline is to make data scientist's lives easier. So as a data scientist, there is no need to change your normal model development workflow! In this example, we will simply expect the data scientist to complete a model_info.json file to define a few pieces of information we will need to execute the CI/CD deployment to Modzy.

Train your model

In this example, we will train a basic digits classification model using a support vector machine from the Scikit-learn machine learning framework.

import json
import numpy as np
import pickle
from sklearn.linear_model import LogisticRegression
from sklearn import datasets

# Import and normalize data
X_digits, y_digits = datasets.load_digits(return_X_y=True)
X_digits = X_digits / X_digits.max()

n_samples = len(X_digits)

# Split data into training and test sets
X_train = X_digits[: int(0.9 * n_samples)]
y_train = y_digits[: int(0.9 * n_samples)]
X_test = X_digits[int(0.9 * n_samples) :]
y_test = y_digits[int(0.9 * n_samples) :]

# Train Model
logistic = LogisticRegression(max_iter=1000)
print(
    "LogisticRegression mean accuracy score: %f"
    % logistic.fit(X_train, y_train).score(X_test, y_test)
)

# save trained model
with open("weights/model_latest.pkl", "wb") as file:
    pickle.dump(logistic, file)

# Save small sample input to use for testing later
sample = X_test[:5].tolist()
with open("data/digits_sample.json", 'w') as out:
    json.dump(sample, out)

Notice we save our model as a pickled weights file in the weights/ directory. Additionally, we save a sample piece of input data for testing in the data/ directory. The file paths to these artifacts are important for the next step.

Define model information

Next, document a few pieces of information about your model in the model_info.json file:

  • name: Desired name for your model when it is deployed to Modzy
  • version: Version of your model to deploy. Note: you can deploy as many versions of the same model to Modzy as you wish
  • weightsFilePath: File path that should point to your most up-to-date weights file
  • sampleDataFilePath: File path that should point to a sample data file that can be used to test your model during the CI/CD process.

Example model_info.json:

{
    "name": "GitHub Action Example",
    "version": "0.0.1",
    "weightsFilePath": "weights/model_latest.pkl",
    "sampleDataFilePath": "data/digits_sample.json"
}

It is important that the weightsFilePath and sampleDataFilePath keys accurately reflect the artifacts you wish to use to deploy your model. Once this looks the way you would like it to, simply commit your changes to the main branch in your repository, and the CI/CD process will do the rest.

Setting up GitHub Action Workflow (DevOps, MLE)

To set up the GitHub Action workflow in the data scientist's repository, you will need to define your workflow in a .yaml file and also set a few GitHub encrypted secrets that are accessed in the workflow script.

Define workflow

First, create a yaml file to configure the GitHub workflow: .github/workflow/ci.yml.

Next, paste the following into your yaml file as a template starter:

name: Build

on:
  push:
    paths:
      'model_info.json'

jobs:
  build:
    name: Build Container and Publish to Modzy
    runs-on: ubuntu-latest
    steps:
    - uses: actions/[email protected]
    - uses: actions/[email protected]
      with:
        python-version: '3.9'
        cache: 'pip'
    - run: pip install -r requirements.txt
    - name: Invoke Chassis Service
      env: 
        DOCKER_USER: ${{ secrets.DOCKER_USER }}
        DOCKER_PASS: ${{ secrets.DOCKER_PASS }}
        MODZY_URL: ${{ secrets.MODZY_URL }}
        MODZY_API_KEY: ${{ secrets.MODZY_API_KEY }}
        CHASSIS_SERVICE: ${{ secrets.CHASSIS_SERVICE }}
      uses: jannekem/[email protected]
      with:
        script: |
          import os, json, pickle, chassisml
          import numpy as np
          chassis_creds = {
              "dockerhub_user": os.getenv("DOCKER_USER"),
              "dockerhub_pass": os.getenv("DOCKER_PASS"),
              "modzy_url": os.getenv("MODZY_URL"),
              "modzy_api_key": os.getenv("MODZY_API_KEY"),
          }
          with open("model_info.json", "r") as model_file:
              model_info = json.load(model_file)
          
          model = pickle.load(open(model_info["weightsFilePath"], "rb"))
          def process(input_bytes):
              '''
              This method takes raw bytes as input and runs inference on the data with the loaded_model object
              '''
              inputs = np.array(json.loads(input_bytes))
              inference_results = model.predict(inputs)
              structured_results = []
              for inference_result in inference_results:
                  structured_output = {
                      "data": {
                          "result": {"classPredictions": [{"class": str(inference_result), "score": str(1)}]}
                      }
                  }
                  structured_results.append(structured_output)
              return structured_results 
          chassis_client = chassisml.ChassisClient(os.getenv("CHASSIS_SERVICE"))
          chassis_model = chassis_client.create_model(process_fn=process)
          try:
            results = chassis_model.test(model_info["sampleDataFilePath"])
          except Exception as e:
            raise ValueError("Error testing model: {}".format(e))
          response = chassis_model.publish(
            model_name=model_info["name"],
            model_version=model_info["version"],
            registry_user=chassis_creds["dockerhub_user"],
            registry_pass=chassis_creds["dockerhub_pass"],
            modzy_url=chassis_creds["modzy_url"],
            modzy_api_key=chassis_creds["modzy_api_key"],
            modzy_sample_input_path=model_info["sampleDataFilePath"]
          )
          print(response.get('job_id'))
          final_status = chassis_client.block_until_complete(response.get('job_id'))
          print(final_status)
          if not (final_status["status"]["failed"] is None and final_status["status"]["conditions"][0]["type"] == "Complete"):
            raise ValueError("Error publishing model (See details above)")

If you are following the example template, then you do not need to make any changes to this script. However, if your data scientist is building a different model, the only part of this script that will need to change is the python code under the script section.

This python Chassis code is configured to load the saved support vector machine model as defined in the weightsFileName key within model_info.json, and then run inference on incoming data, as defined in the process method. Work with your data scientist to make any changes to this python code as needed, and reference the different Chassis framework guides for assistance.

Set up GitHub secrets

The last thing you need to configure to ensure the ci.yml file will run when triggered are GitHub secrets. To do so, navigate to the Settings tab within your repository, and click on Secrets :arrow-forward: Actions.

Figure 1. GitHub secretsFigure 1. GitHub secrets

Figure 1. GitHub secrets


On the top right of the page, select "New Repository Secret". In the "Name" field, past the value in the "Name" column in the table below. The "Value Description" column in the below table describes what the value should look like with corresponding examples. Create a new repository secret for each row in this table.

Name

Value Description

Example

CHASSIS_SERVICE

URL to publicly-hosted Chassis service

https://chassis-xxxxxxxxxx.modzy.com

DOCKER_USER

Valid Dockerhub username

my-docker-username

DOCKER_PASS

Valid Dockerhub password

my-docker-password

MODZY_URL

Valid Modzy instance URL

https://app.modzy.com

MODZY_API_KEY

Valid Modzy API key associated with MODZY_URL instance. Note: this API key must be associated with a user that has the "Data Scientist" role

q4jp1pOZyFTddkFsOYwI.flHw34veJgfKu2MNzAa7

In your ci.yml file, you will notice the following snippet that accesses these secrets and exports them to environment variables. These environment variables are referenced in the Chassis code that executes the automatic containerization and deployment of this model to Modzy.

      env: 
        DOCKER_USER: ${{ secrets.DOCKER_USER }}
        DOCKER_PASS: ${{ secrets.DOCKER_PASS }}
        MODZY_URL: ${{ secrets.MODZY_URL }}
        MODZY_API_KEY: ${{ secrets.MODZY_API_KEY }}
        CHASSIS_SERVICE: ${{ secrets.CHASSIS_SERVICE }}

Congratulations! You can now deploy models to Modzy automatically through your CI/CD pipeline. Integrating CI/CD into ModelOps improves process efficiencies, leverages DevOps best practices, and most importantly, gives data scientists the mechanisms required to push machine learning into production, without having to write a single line of production code.


What’s Next

Check out more CI/CD examples to improve your MLOps pipeline!

Did this page help you?