This guide demonstrates the process of automatically containerizing your XGBoost.

📘

What you will need

  • Dockerhub account
  • Connection to running Chassis.ml service (either from a local deployment or via publicly-hosted service)
  • Trained XGBoost model that can be loaded into memory or code to train a XGBoost model from scratch
  • Python environment

NOTE: To follow along, you can reference the Jupyter notebook example and data files here.

Set Up Environment

👍

We recommend you follow this guide using a Jupyter Notebook. Follow the appropriate install instructions based on your environment.

Create a Python virtual environment and install the python packages required to load and run your model. At a minimum, pip install the following packages:

pip install chassisml modzy-sdk

If you would like to follow this guide directly, pip install the following additional packages:

scikit-learn>=4.5.5.64
numpy>=1.22.3
xgboost>=1.6.0

Load Model into Memory

If you plan to use the Chassis service, you must first load your model into memory. If you have your trained model file saved locally (.pth, .pkl, .h5, .joblib, or other file format), you can load your model from the weights file directly, or alternatively train and use the model object.

import cv2
import chassisml
from io import StringIO
import numpy as np
import pandas as pd
import json
import xgboost as xgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


# load data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# save sample data for testing later
with open("data/sample_house_data.csv", "w") as f:
    X_test[:10].to_csv(f, index=False)

# build XGBoost regressor
regressor = xgb.XGBRegressor(
    n_estimators=100,
    reg_lambda=1,
    gamma=0,
    max_depth=3
)

# train model
regressor.fit(X_train, y_train)

The fit() execution will print the following:

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             gamma=0, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.300000012,
             max_delta_step=0, max_depth=3, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=12,
             num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None)

Run inference on your X_test subset and evaluate model performance.

# run inference
y_pred = regressor.predict(X_test)
# evaluate model
mean_squared_error(y_test, y_pred)
>>>> 7.658965947061991

Define process Function

You can think of this function as your "inference" function that will take input data as raw bytes, process the inputs, make predictions, and return the results. This method is the sole parameter required to create a ChassisModel object.

def process(input_bytes):
    # load data
    inputs = pd.read_csv(StringIO(str(input_bytes, "utf-8")))    

    # run inference
    preds = regressor.predict(inputs)

    # structure results
    inference_result = {
        "housePricePredictions": [
            {"row": i+1, "price": preds[i].round(0)*1000} for i in range(len(preds))
        ]
    }

    structured_output = {
        "data": {
            "result": inference_result,
            "explanation": None,
            "drift": None,
        }
    }
    return structured_output

Create ChassisModel Object and Publish Model

First, connect to a running instance of the Chassis service - either by deploying on your machine or by connecting to the publicly hosted version of the service). Then, you can use the process function you defined to create a ChassisModel object, run a few tests to ensure your model object returns the expected results, and finally publish your model.

chassis_client = chassisml.ChassisClient("http://localhost:5000")
chassis_model = chassis_client.create_model(process_fn=process)

Define sample file from local filepath and run a series of tests.

NOTE: test_env method is not available on publicly-hosted service.

sample_filepath = './data/sample_house_data.csv'
results = chassis_model.test(sample_filepath)
print(results)

test_env_result = chassis_model.test_env(sample_filepath)
print(test_env_result)

Define your Dockerhub credentials and publish your model.

dockerhub_user = <my.username>
dockerhub_pass = <my.password>

response = chassis_model.publish(
   model_name="XGBoost Boston Housing Price Predictions",
   model_version="0.0.1",
   registry_user=dockerhub_user,
   registry_pass=dockerhub_pass,
   modzy_url=modzy_url
)

job_id = response.get('job_id')
final_status = chassis_client.block_until_complete(job_id)

You have successfully completed the packaging of your XGBoost model. In your Dockerhub account, you should see your new container listed in the "Repositories" tab.

14381438

Figure 1. Example Chassis-built Container

Congratulations! In just minutes you automatically created a Docker container with just a few lines of code. To deploy of your new model container to Modzy, follow the Import Container guide.







What’s Next
Did this page help you?