Overview

Dataiku is a leading AI and machine learning platform that provides a wide array of data science functionality for organizations, including data preparation, data visualization, model training, AutoML, and many other analytics capabilities. If your organization uses Dataiku to support your data science teams in model building, you can export those models from the Dataiku and import directly into Modzy. This pipeline yields a couple key benefits:

  • Keep your data science processes unchanged and leverage your Dataiku platform to build powerful AI models with collaboration across the enterprise
  • Take advantage of Modzy's platform to deploy models built anywhere, centrally manage and govern them, and connect them to your business applications with Modzy's robust production APIs

This guide will show you how to export a model from Dataiku and convert it into an OMI-compliant container image that can be imported into Modzy.

📘

What you will need to get started

  • Dataiku Enterprise account (Note: only Dataiku Enterprise supports model exports)
  • Docker Hub account
  • Connection to running Chassis.ml service (either from a local deployment or via publicly-hosted service)

Integration Workflow

Export Dataiku Model

Dataiku offers several model export options:

  • Python
  • MLflow
  • Java class/JAR
  • PMML
  • Jupyter Notebook

Modzy supports any of these export options, but this guide walks through the process of importing models exported in the Python and MLflow formats.

To export a model to Python or MLflow format, first navigate to the trained model you wish to export (either a model trained in the Lab or a version of a saved model deployed in the Flow), and:

  • Click the Actions drop down on the top right corner of the screen
  • Select Export model as <>, where <> is either Python or MLflow, and download the export file
  • For the MLflow export option, unzip the file to your machine (unzip not required for Python option)

Please view the documentation for exporting models on Dataiku's docs site for more details.

Next, make sure you have at least one piece of sample data to test your exported model during the process of converting it into an OMI compliant container.

Load model and define inference function

We now need to load our model into memory from the exported file and convert it into a container image that can be deployed to Modzy. To start, you will need to load in your model.

The following example code makes two assumptions:

  1. We are working with a model trained in DSS on the iris dataset
  2. There is a json file named sample_data.json that contains sample iris data in the following format

[[5.1, 3.5, 1.4, 0.2], [4.9, 3.0, 1.4, 0.2]]

import json
import chassisml
import numpy as np
import dataikuscoring

# Load model
model = dataikuscoring.load_model("path/to/model.zip") # Python format
'''
Use this altnerative for MLflow format
model = dataikuscoring.mlflow.load_model("path/to/model_unzipped") # MLflow format
'''

# Define process function
def process(input_bytes):
    inputs = np.array(json.loads(input_bytes))
    inference_results = model.predict_proba(inputs)
    structured_results = []
    for key in list(inference_results.keys()):
        structured_output = {
            "data": {
                "result": {"classPredictions": [{"class": key, "score": inference_results[key][0]}]}
            }
        }
        structured_results.append(structured_output)  
    return structured_results

Create and save a model container image with chassis.ml

First, connect to a running instance of the Chassis service (the example code connects to the publicly-hosted service at https://chassis.app.modzy.com). Then you can use the process function you defined to create a ChassisModel object and publish your model. _Note: you will need to provide credentials to your Docker Hub account so that your container image can be saved.

chassis_client = chassisml.ChassisClient("https://chassis.app.modzy.com")
chassis_model = chassis_client.create_model(process_fn=process)

dockerhub_user = <my.username>
dockerhub_pass = <my.password>

response = chassis_model.publish(
   model_name="Dataiku Iris Classification",
   model_version="0.0.1",
   registry_user=dockerhub_user,
   registry_pass=dockerhub_pass
)

job_id = response.get('job_id')
final_status = chassis_client.block_until_complete(job_id)

Once you've updated this sample code with your Docker Hub credentials, you can run this cell in your Jupyter Notebook and Chassis will start converting your model into a container image. After a few minutes, it will automatically push your Dataiku model to your Docker Hub account.

Import container image into Modzy

Finally, you can deploy your Dataiku model container into Modzy in just a few simple steps and be ready to run, monitor, and integrate your model at massive scale!