GuidesRecipesAPI ReferenceChangelogDiscussions
Log In

Overview

Dataiku is a leading AI and machine learning platform that provides a wide array of data science functionality for organizations, including data preparation, data visualization, model training, AutoML, and many other analytics capabilities. If your organization uses Dataiku to support your data science teams in model building, you can export those models from the Dataiku and import directly into Modzy. This pipeline yields a couple key benefits:

  • Keep your data science processes unchanged and leverage your Dataiku platform to build powerful AI models with collaboration across the enterprise
  • Take advantage of Modzy's platform to deploy models built anywhere, centrally manage and govern them, and connect them to your business applications with Modzy's robust production APIs

This guide will show you how to export a model from Dataiku and convert it into an OMI-compliant container image that can be imported into Modzy.

📘

What you will need to get started

  • Dataiku Enterprise account (Note: only Dataiku Enterprise supports model exports)
  • Python (v3.8 or greater supported)
  • Docker (Installation instructions here)

Environment Setup

You will first need to set up a Python virtual environment and install the Chassis library, dataiku library, and numpy.

pip install chassisml numpy dataiku-scoring

Integration Workflow

Export Dataiku Model

Dataiku offers several model export options:

  • Python
  • MLflow
  • Java class/JAR
  • PMML
  • Jupyter Notebook

Modzy supports any of these export options, but this guide walks through the process of importing models exported in the Python and MLflow formats.

To export a model to Python or MLflow format, first navigate to the trained model you wish to export (either a model trained in the Lab or a version of a saved model deployed in the Flow), and:

  • Click the Actions drop down on the top right corner of the screen
  • Select Export model as <>, where <> is either Python or MLflow, and download the export file
  • For the MLflow export option, unzip the file to your machine (unzip not required for Python option)

Please view the documentation for exporting models on Dataiku's docs site for more details.

Next, make sure you have at least one piece of sample data to test your exported model during the process of converting it into an OMI compliant container.

Load model and define inference function

We now need to load our model into memory from the exported file and convert it into a container image that can be deployed to Modzy. To start, you will need to load in your model.

The following example code makes two assumptions:

  1. We are working with a model trained in DSS on the iris dataset
  2. There is a json file named sample_data.json that contains sample iris data in the following format

[[5.1, 3.5, 1.4, 0.2], [4.9, 3.0, 1.4, 0.2]]

import json
import numpy as np
import dataikuscoring
from chassisml import ChassisModel
from chassis.builder import DockerBuilder
from typing import Mapping, Dict

# Load model
model = dataikuscoring.load_model("path/to/model.zip") # Python format
'''
Use this altnerative for MLflow format
model = dataikuscoring.mlflow.load_model("path/to/model_unzipped") # MLflow format
'''

# define predict function
def predict(input_bytes: Mapping[str, bytes]) -> dict[str, bytes]:
    inputs = np.array(json.loads(input_bytes['input']))
    inference_results = model.predict_proba(inputs)
    structured_results = []
    for inference_result in inference_results:
        structured_output = {
            "data": {
                "result": {"classPredictions": [{"class": np.argmax(inference_result).item(), "score": round(np.max(inference_result).item(), 5)}]}
            }
        }
        structured_results.append(structured_output)
    return {'results.json': json.dumps(structured_results).encode()}

Create model container locally

First, use your predict function you defined to create a ChassisModel object and publish your model. Next, define some metadata (pip requirements, inputs, outputs, etc.). Finally, you can test your Chassis model and build a container using the DockerBuilder option.

# create chassis model object, add required dependencies, and define metadata
chassis_model = ChassisModel(process_fn=predict)                
chassis_model.add_requirements(["scikit-learn", "numpy"])       
chassis_model.metadata.model_name = "Digits Classifier"         
chassis_model.metadata.model_version = "0.0.1"
chassis_model.metadata.add_input(
    key="input",
    accepted_media_types=["application/json"],
    max_size="10M",
    description="Numpy array representation of digits image"
)
chassis_model.metadata.add_output(
    key="results.json",
    media_type="application/json",
    max_size="1M",
    description="Top digit prediction and confidence score"
)

# test model
results = chassis_model.test(guides.DigitsSampleData)
print(results)

# build container
builder = DockerBuilder(chassis_model)
start_time = time.time()
res = builder.build_image(name="dataiku-model", tag="0.0.1", show_logs=True)
end_time = time.time()
print(res)
print(f"Container image built in {round((end_time-start_time)/60, 5)} minutes")

This code should take just under a minute. The job_results of a succesful built will display the details of your new container image (note: the "Image ID" digest will be different for each build)

Generating Dockerfile...Done!
Copying libraries...Done!
Writing metadata...Done!
Compiling pip requirements...Done!
Copying files...Done!
Starting Docker build...Done!
Image ID: sha256:d222014ffe7bacd27382fb00cb8686321e738d7c80d65f0290f4c303459d3d65
Image Tags: ['dataiku-model:0.0.1']
Cleaning local context
Completed:       True
Success:         True
Image Tag:       dataiku-model:0.0.1

Import container image into Modzy

Finally, you can deploy your Dataiku model container into Modzy in just a few simple steps and be ready to run, monitor, and integrate your model at massive scale!