On-Premise

Guide for deploying Modzy using Ubuntu virtual machines

Overview

This guide provides an suggested set of step-by-step instructions for deploying Modzy using bare metal servers or virtual machines. There are many ways to meet Modzy's pre-requisites but the following is the most self-contained with the fewest number of additional dependencies. In particular, we will utilize Rook to provide our storage layer since it supports both Kubernetes dynamic persistent volumes as well as a S3-compatible object storage gateway using the same pool of attached disks.

Virtual Machine setup

Like any Modzy installation, there are two sets of machines that will need to be deployed. The first set we'll call the "Platform" servers and these will run Modzy's application services and dependencies. The second set we'll call the "Inference" servers and those will run the models. The hardware requirements and count of Inference servers is highly dependent on the size and quantity of models that you want to be able to run at any given time.

The Platform servers are pretty straightforward. A minimal on-prem Modzy deployment requires 3 servers with the following configuration (high availability setups may require more):

Operating SystemvCPUsRAM (GiB)Count
Ubuntu 20.04 LTS8323 (or more)

In addition to the above requirements, we are going to set these servers up to use hyper-converged storage so they should have 2 or more disks attached. The first disk will be the typical one for the operating system. The rest of the disks should be unformatted and will be used in the Rook storage pool.

We typically recommend that you have 1TB of object storage available to store models, results, and other critical Modzy data. Since our storage pool can be enabled with replication, the total amount of storage should take this into account. For example, if you use a basic replication factor of 2, then you should have 2TB total spread across the number of additional hard drives you mount to these virtual machines.

Install Kubernetes

To install Kubernetes, we're going to use a production-grade, lightweight version of Kubernetes called K3s. The installation will take place in two phases.

Set up primary control-plane node

For simplicity's sake, we're not going to set up dedicated nodes for the Kubernetes control plane. In a larger installation or for maximum resilience, the 3 Kubernetes control-plane nodes should be dedicated just to run the control plane. For small to medium sized Modzy installations, we can install the needed services, Kubernetes control plane, storage services, and Modzy's application services on the same 3 servers. For larger installations, these three capabilities should be separated out into a dedicated set of nodes each for the Kubernetes control plane, the Rook services, and Modzy's application services.

First, we're going to generate a secret token that will be used by new nodes when they join the cluster. You can do this however you want but this command will generate a unique alphanumeric value you can use:

$ mktemp -u XXXXXXXXXX
zOI6s3cFgJ

Now that you have your token, store it somewhere safe.

Open a SSH session to your first node and run the following command to initialize the K3s cluster:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.12+k3s1 sh -s - server --disable traefik --disable local-storage --cluster-init --node-label "modzy.com/node-type=platform" --token <INSERT YOUR TOKEN HERE>

The above command will download all the necessary components, create a systemd service, start the service, and initialize etcd for a high-availability Kubernetes cluster.

📘

NOTE

Before moving on, you can find a KUBECONFIG file at /etc/rancher/k3s/k3s.yaml. You can copy this file to your local machine (you'll need to edit the URL to point at the DNS name or IP address of the node rather than [<https://127.0.0.1:6443>](https://127.0.0.1:6443`). You can also create a new service account for yourself, whichever you prefer.

Set up remaining control-plane nodes

Open a SSH session to your remaining 2 Platform nodes and run the following command the join them as control-plane nodes to your first server to complete the high availability setup.

📘

NOTE

You can refer to your primary server by either its DNS name or by its IP address during this step

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.12+k3s1 sh -s - server --disable traefik --disable local-storage --server https://<IP or DNS of primary node>:6443 --node-label "modzy.com/node-type=platform" --token <INSERT YOUR TOKEN HERE>

Install additional Platform servers

Now that we have a highly available Kubernetes control plane, any additional Platform servers you need can be installed as "agents" (as K3s calls them) that participate in the cluster but don't run any of the Kubernetes control plane services.

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.12+k3s1 sh -s - agent --server https://<IP or DNS of primary node>:6443 --node-label "modzy.com/node-type=platform" --token <INSERT YOUR TOKEN HERE>

Install Inference servers

If you have Inference nodes with attached GPUs, please follow the instructions for configuring a GPU node.

To enforce the separation between Platform and Inference servers, we apply a Kubernetes taint to our Inference servers. You can join them to the cluster with this taint using the following command:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.12+k3s1 sh -s - agent --server https://<IP or DNS of primary node>:6443 --node-label "modzy.com/node-type=inference" --node-taint "modzy.com/inference-node=true:NoSchedule" --token <INSERT YOUR TOKEN HERE>

Using self-signed certificates

If you're using a self-signed certificate for your Modzy installation, there is one additional step you need to perform when creating your Inference nodes. By default, Docker (containerd specifically in this case), requires proper TLS validation before it will pull images from a registry. If you're using a self-signed certificate then Modzy's internal registry will fail these checks so we need to tell the daemon to skip these checks for Modzy's registry.

If we assume that you're installing Modzy under the domain modzy.example.com, then the registry will be exposed as registry.modzy.example.com. So we need to disable TLS validation for this domain.

On each of your Inference nodes, open a SSH session and create a file /etc/rancher/k3s/registries.yaml with the following content:

configs:
  "registry.modzy.example.com":
    tls:
      insecure_skip_verify: true

Replace registry.modzy.example.com with the domain that you're going to use.

After writing this file, you need to restart the k3s service for the change to take effect:

sudo systemctl restart k3s-agent

Install Rook

Now that we have all our Kubernetes nodes provisioned and online, the next step is to set up our storage layer. We'll do this using Rook.

Make sure you have Helm installed. If you're using one of your Kubernetes nodes for this step, you can install it with the following command:

sudo snap install helm --classic

📘

NOTE

The KUBECONFIG file on each Kubernetes node is owned by root so the easiest way to use it is to start a bash shell as root with sudo bash and then set the active KUBECONFIG with export KUBECONFIG=/etc/rancher/k3s/k3s.yaml before running any Helm commands.

Now that Helm is available, start by adding the Rook Helm repository:

helm repo add rook-release https://charts.rook.io/release

Next we need to create a Kubernetes namespace for it:

kubectl create ns rook-ceph

📘

NOTE

You can use any namespace you want but if you change it, you'll need to change it in all subsequent example YAML files and URLs.

Next we will install the Rook Operator:

helm install --namespace rook-ceph rook-ceph rook-release/rook-ceph

📘

NOTE

You can use any Helm release name here as well, but like before, if you change it, you'll need to update all subsequent example YAML files and URLs.

Configuring Ceph

Rook makes it easy to install Ceph, an open source software-defined storage solution that will allow us to pool all those extra hard disks we attached to our VMs into a single fault-tolerant storage pool.

The first step is to configure the Ceph cluster. We will do this with a Kubernetes Custom Resource Definition (CRD) provided by Rook.

First, we need to gather 2 bits of information. The first is the node name in Kubernetes for each of the nodes. The second is the name of the additional block device as reported by Ubuntu.

To collect the node names, you can simple run kubectl get nodes and the values will be the NAME value in the first column.

To collect the block device name, run lsblk on each of the Ubuntu servers and look for an entry that doesn't have any partitions assigned. It will probably be something like sdb, nvme1n1 or something along those lines.

Finally, we can construct our Custom Resource using the information we just gathered:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v16.2
  dataDirHostPath: /var/lib/rook
  dashboard:
    enabled: true
    ssl: false
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    modules:
      - name: pg_autoscaler
        enabled: true
  storage:
    useAllNodes: false
    useAllDevices: false
    nodes:
      - name: ip-10-10-4-225   # Use the Kubernetes node name here
        devices:
          - name: nvme1n1      # Use the block device name here
      - name: ip-10-10-4-96    # Add entries for each server that has extra disks
        devices:
          - name: nvme1n1
      - name: ip-10-10-5-38
        devices:
          - name: nvme1n1

📘

NOTE

Configuring Rook and Ceph is a whole topic on its own and there are many options you can change to suit your needs. You can find Rook's documentation here and Ceph's documentation here

Save this file with your changes and then apply it to Kubernetes:

kubectl apply -f ceph-cluster.yaml

You can watch the progress using a tool like Lens or Infra. You should see a bunch of pods spin up.

Once they've all started and report healthy, you can open the Ceph dashboard to view the status of the Ceph cluster. For now we'll just port-forward to the Kubernetes service. In a new Terminal window, you can run:

kubectl -n rook-ceph port-forward svc/rook-ceph-mgr-dashboard 7000:7000

Now you can open a browser and go to http://localhost:7000. The username is admin and the password can be found in a Kubernetes secret. You can retrieve the password with the following command:

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d

Once the Ceph dashboard reports that everything is HEALTH_OK, we can proceed to the next step.

Configuring Block and Object storage

Next we're going to provision some block storage (for Kubernetes dynamic persistent volumes) and object storage (for S3-compatible buckets).

The following two files are examples that should work for a basic installation. They're configured to use a replication factor of 2 so that each bit of data is stored on two separate disks to prevent data loss. In a typical production situation you may want to increase this and add additional rules about failure domains but that is beyond the scope of this guide. See the Rook documentation for more information.

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 2                           # Optionally change the replication factor here
    requireSafeReplicaSize: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: replicapool
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
  name: modzy-store
  namespace: rook-ceph
spec:
  metadataPool:
    failureDomain: host
    replicated:
      size: 2                        # Optionally change the replication factor here
  dataPool:
    failureDomain: host
    # This is configured for erasure coding but can also use straight replication
    erasureCoded:
      dataChunks: 2                  # Optionally change erasure coding settings here
      codingChunks: 1
  preservePoolsOnDelete: true
  gateway:
    type: s3
    port: 80
    instances: 2
  healthCheck:
    bucket:
      disabled: false
      interval: 60s
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-bucket
provisioner: rook-ceph.ceph.rook.io/bucket
reclaimPolicy: Delete
parameters:
  objectStoreName: modzy-store
  objectStoreNamespace: rook-ceph
kubectl apply -f ceph-block-pool.yaml
kubectl apply -f ceph-object-store.yaml

Prepare for Modzy Installation

Before launching the Modzy installer, there are several bits of information that we need to collect and/or configure.

Create S3 buckets

Now that we have our object storage available, let's launch the Ceph Dashboard by port-forwarding to the Kubernetes service (see above), log in, and create a user and the three buckets that Modzy needs.

Set aside the access key, secret key and bucket names you created. We'll need them in the next step.

Configure an Identity Provider

If you already have a SAML 2.0-compatible identify provider, you can add an application/client to it and download the metadata.xml. We'll need that in the next step.

If you don't already have an identity provider, please see this guide to install and configure Keycloak for use with Modzy.

Collect credentials to a mail server

Modzy sends several emails to users when they're added as a user, join a new team, etc. so it needs to be configured with a mail server that is allowed to send email to your users.

Set up DNS

Modzy requires two DNS entries to function: the domain you'd like to use for Modzy itself, and a wildcard entry underneath that. For example, if you're primary domain is example.com, then you might create the following two entries:

modzy.example.com
*.modzy.example.com

Both of those entries should have A records that point to all three servers we set up at the beginning so that it will do round-robin load balancing via DNS.

Create TLS certificates

Next, create a TLS certificate that includes the two DNS entries you just created as Subject Alternative Name (SAN) entries. Save the certificate and the private key. We'll need them in the next step.

Install Modzy

Modzy is installed, configured, and upgraded using KOTS. To begin, download the kubectl plugin:

curl https://kots.io/install | bash

Next, let's start the Modzy installation process!

kubectl kots install modzy -n modzy

After a minute or so, your Terminal should say that the Admin Console is available at http://localhost:8800. Open it in your browser so that it can walk you through the installation process.

First, upload your license file

After the pre-flight checks pass, you're presented with the Modzy configuration screen. Choose your installation type and upload your TLS certificate. If you're using a self-signed certificate, be sure to un-check the "Verify TLS" box.

Set your domain name and upload the TLS certificate and private key you generated earlier.

Under storage settings, choose "Generic S3" as the provider and enter the following as the endpoint (assuming you did not edit any of the example files above)

[<http://rook-ceph-rgw-modzy-store.rook-ceph.svc:80>](http://rook-ceph-rgw-modzy-store.rook-ceph.svc:80`)

Add the access key, secret key, and bucket names in the appropriate fields.

Enter your mail server information.

To create the first administrator user, enter the email address of the person in your identity provider who will automatically be created as the first admin user. That person can then add all further users to Modzy in Modzy's UI.

Finally, check the "Bypass DNS Setup" box. This feature is not supported in on-premise installations.

With all the configuration set, click Continue. If you still have Lens or Infra up and are monitoring the state of Kubernetes, you can watch as Modzy gets installed and configured. Once all the pods have finished launching and report healthy, you can now log in!