On-Premise
Guide for deploying Modzy using Ubuntu virtual machines
Overview
This guide provides an suggested set of step-by-step instructions for deploying Modzy using bare metal servers or virtual machines. There are many ways to meet Modzy's pre-requisites but the following is the most self-contained with the fewest number of additional dependencies. In particular, we will utilize Rook to provide our storage layer since it supports both Kubernetes dynamic persistent volumes as well as a S3-compatible object storage gateway using the same pool of attached disks.
Virtual Machine setup
Like any Modzy installation, there are two sets of machines that will need to be deployed. The first set we'll call the "Platform" servers and these will run Modzy's application services and dependencies. The second set we'll call the "Inference" servers and those will run the models. The hardware requirements and count of Inference servers is highly dependent on the size and quantity of models that you want to be able to run at any given time.
The Platform servers are pretty straightforward. A minimal on-prem Modzy deployment requires 3 servers with the following configuration (high availability setups may require more):
Operating System | vCPUs | RAM (GiB) | Count |
---|---|---|---|
Ubuntu 20.04 LTS | 8 | 32 | 3 (or more) |
In addition to the above requirements, we are going to set these servers up to use hyper-converged storage so they should have 2 or more disks attached. The first disk will be the typical one for the operating system. The rest of the disks should be unformatted and will be used in the Rook storage pool.
We typically recommend that you have 1TB of object storage available to store models, results, and other critical Modzy data. Since our storage pool can be enabled with replication, the total amount of storage should take this into account. For example, if you use a basic replication factor of 2, then you should have 2TB total spread across the number of additional hard drives you mount to these virtual machines.
Install Kubernetes
To install Kubernetes, we're going to use a production-grade, lightweight version of Kubernetes called K3s. The installation will take place in two phases.
Set up primary control-plane node
For simplicity's sake, we're not going to set up dedicated nodes for the Kubernetes control plane. In a larger installation or for maximum resilience, the 3 Kubernetes control-plane nodes should be dedicated just to run the control plane. For small to medium sized Modzy installations, we can install the needed services, Kubernetes control plane, storage services, and Modzy's application services on the same 3 servers. For larger installations, these three capabilities should be separated out into a dedicated set of nodes each for the Kubernetes control plane, the Rook services, and Modzy's application services.
First, we're going to generate a secret token that will be used by new nodes when they join the cluster. You can do this however you want but this command will generate a unique alphanumeric value you can use:
$ mktemp -u XXXXXXXXXX
zOI6s3cFgJ
Now that you have your token, store it somewhere safe.
Open a SSH session to your first node and run the following command to initialize the K3s cluster:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.12+k3s1 sh -s - server --disable traefik --disable local-storage --cluster-init --node-label "modzy.com/node-type=platform" --token <INSERT YOUR TOKEN HERE>
The above command will download all the necessary components, create a systemd
service, start the service, and initialize etcd
for a high-availability Kubernetes cluster.
NOTE
Before moving on, you can find a
KUBECONFIG
file at/etc/rancher/k3s/k3s.yaml
. You can copy this file to your local machine (you'll need to edit the URL to point at the DNS name or IP address of the node rather than[<https://127.0.0.1:6443
>](https://127.0.0.1:6443`). You can also create a new service account for yourself, whichever you prefer.
Set up remaining control-plane nodes
Open a SSH session to your remaining 2 Platform nodes and run the following command the join them as control-plane nodes to your first server to complete the high availability setup.
NOTE
You can refer to your primary server by either its DNS name or by its IP address during this step
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.12+k3s1 sh -s - server --disable traefik --disable local-storage --server https://<IP or DNS of primary node>:6443 --node-label "modzy.com/node-type=platform" --token <INSERT YOUR TOKEN HERE>
Install additional Platform servers
Now that we have a highly available Kubernetes control plane, any additional Platform servers you need can be installed as "agents" (as K3s calls them) that participate in the cluster but don't run any of the Kubernetes control plane services.
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.12+k3s1 sh -s - agent --server https://<IP or DNS of primary node>:6443 --node-label "modzy.com/node-type=platform" --token <INSERT YOUR TOKEN HERE>
Install Inference servers
If you have Inference nodes with attached GPUs, please follow the instructions for configuring a GPU node.
To enforce the separation between Platform and Inference servers, we apply a Kubernetes taint to our Inference servers. You can join them to the cluster with this taint using the following command:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.12+k3s1 sh -s - agent --server https://<IP or DNS of primary node>:6443 --node-label "modzy.com/node-type=inference" --node-taint "modzy.com/inference-node=true:NoSchedule" --token <INSERT YOUR TOKEN HERE>
Using self-signed certificates
If you're using a self-signed certificate for your Modzy installation, there is one additional step you need to perform when creating your Inference nodes. By default, Docker (containerd specifically in this case), requires proper TLS validation before it will pull images from a registry. If you're using a self-signed certificate then Modzy's internal registry will fail these checks so we need to tell the daemon to skip these checks for Modzy's registry.
If we assume that you're installing Modzy under the domain modzy.example.com
, then the registry will be exposed as registry.modzy.example.com
. So we need to disable TLS validation for this domain.
On each of your Inference nodes, open a SSH session and create a file /etc/rancher/k3s/registries.yaml
with the following content:
configs:
"registry.modzy.example.com":
tls:
insecure_skip_verify: true
Replace registry.modzy.example.com
with the domain that you're going to use.
After writing this file, you need to restart the k3s service for the change to take effect:
sudo systemctl restart k3s-agent
Install Rook
Now that we have all our Kubernetes nodes provisioned and online, the next step is to set up our storage layer. We'll do this using Rook.
Make sure you have Helm installed. If you're using one of your Kubernetes nodes for this step, you can install it with the following command:
sudo snap install helm --classic
NOTE
The
KUBECONFIG
file on each Kubernetes node is owned byroot
so the easiest way to use it is to start abash
shell as root withsudo bash
and then set the activeKUBECONFIG
withexport KUBECONFIG=/etc/rancher/k3s/k3s.yaml
before running any Helm commands.
Now that Helm is available, start by adding the Rook Helm repository:
helm repo add rook-release https://charts.rook.io/release
Next we need to create a Kubernetes namespace for it:
kubectl create ns rook-ceph
NOTE
You can use any namespace you want but if you change it, you'll need to change it in all subsequent example YAML files and URLs.
Next we will install the Rook Operator:
helm install --namespace rook-ceph rook-ceph rook-release/rook-ceph
NOTE
You can use any Helm release name here as well, but like before, if you change it, you'll need to update all subsequent example YAML files and URLs.
Configuring Ceph
Rook makes it easy to install Ceph, an open source software-defined storage solution that will allow us to pool all those extra hard disks we attached to our VMs into a single fault-tolerant storage pool.
The first step is to configure the Ceph cluster. We will do this with a Kubernetes Custom Resource Definition (CRD) provided by Rook.
First, we need to gather 2 bits of information. The first is the node name in Kubernetes for each of the nodes. The second is the name of the additional block device as reported by Ubuntu.
To collect the node names, you can simple run kubectl get nodes
and the values will be the NAME
value in the first column.
To collect the block device name, run lsblk
on each of the Ubuntu servers and look for an entry that doesn't have any partitions assigned. It will probably be something like sdb
, nvme1n1
or something along those lines.
Finally, we can construct our Custom Resource using the information we just gathered:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v16.2
dataDirHostPath: /var/lib/rook
dashboard:
enabled: true
ssl: false
mon:
count: 3
allowMultiplePerNode: false
mgr:
modules:
- name: pg_autoscaler
enabled: true
storage:
useAllNodes: false
useAllDevices: false
nodes:
- name: ip-10-10-4-225 # Use the Kubernetes node name here
devices:
- name: nvme1n1 # Use the block device name here
- name: ip-10-10-4-96 # Add entries for each server that has extra disks
devices:
- name: nvme1n1
- name: ip-10-10-5-38
devices:
- name: nvme1n1
NOTE
Configuring Rook and Ceph is a whole topic on its own and there are many options you can change to suit your needs. You can find Rook's documentation here and Ceph's documentation here
Save this file with your changes and then apply it to Kubernetes:
kubectl apply -f ceph-cluster.yaml
You can watch the progress using a tool like Lens or Infra. You should see a bunch of pods spin up.
Once they've all started and report healthy, you can open the Ceph dashboard to view the status of the Ceph cluster. For now we'll just port-forward to the Kubernetes service. In a new Terminal window, you can run:
kubectl -n rook-ceph port-forward svc/rook-ceph-mgr-dashboard 7000:7000
Now you can open a browser and go to http://localhost:7000. The username is admin
and the password can be found in a Kubernetes secret. You can retrieve the password with the following command:
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d
Once the Ceph dashboard reports that everything is HEALTH_OK
, we can proceed to the next step.
Configuring Block and Object storage
Next we're going to provision some block storage (for Kubernetes dynamic persistent volumes) and object storage (for S3-compatible buckets).
The following two files are examples that should work for a basic installation. They're configured to use a replication factor of 2 so that each bit of data is stored on two separate disks to prevent data loss. In a typical production situation you may want to increase this and add additional rules about failure domains but that is beyond the scope of this guide. See the Rook documentation for more information.
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 2 # Optionally change the replication factor here
requireSafeReplicaSize: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
pool: replicapool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
name: modzy-store
namespace: rook-ceph
spec:
metadataPool:
failureDomain: host
replicated:
size: 2 # Optionally change the replication factor here
dataPool:
failureDomain: host
# This is configured for erasure coding but can also use straight replication
erasureCoded:
dataChunks: 2 # Optionally change erasure coding settings here
codingChunks: 1
preservePoolsOnDelete: true
gateway:
type: s3
port: 80
instances: 2
healthCheck:
bucket:
disabled: false
interval: 60s
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-bucket
provisioner: rook-ceph.ceph.rook.io/bucket
reclaimPolicy: Delete
parameters:
objectStoreName: modzy-store
objectStoreNamespace: rook-ceph
kubectl apply -f ceph-block-pool.yaml
kubectl apply -f ceph-object-store.yaml
Prepare for Modzy Installation
Before launching the Modzy installer, there are several bits of information that we need to collect and/or configure.
Create S3 buckets
Now that we have our object storage available, let's launch the Ceph Dashboard by port-forwarding to the Kubernetes service (see above), log in, and create a user and the three buckets that Modzy needs.
Set aside the access key, secret key and bucket names you created. We'll need them in the next step.
Configure an Identity Provider
If you already have a SAML 2.0-compatible identify provider, you can add an application/client to it and download the metadata.xml
. We'll need that in the next step.
If you don't already have an identity provider, please see this guide to install and configure Keycloak for use with Modzy.
Collect credentials to a mail server
Modzy sends several emails to users when they're added as a user, join a new team, etc. so it needs to be configured with a mail server that is allowed to send email to your users.
Set up DNS
Modzy requires two DNS entries to function: the domain you'd like to use for Modzy itself, and a wildcard entry underneath that. For example, if you're primary domain is example.com
, then you might create the following two entries:
modzy.example.com
*.modzy.example.com
Both of those entries should have A
records that point to all three servers we set up at the beginning so that it will do round-robin load balancing via DNS.
Create TLS certificates
Next, create a TLS certificate that includes the two DNS entries you just created as Subject Alternative Name (SAN) entries. Save the certificate and the private key. We'll need them in the next step.
Install Modzy
Modzy is installed, configured, and upgraded using KOTS. To begin, download the kubectl
plugin:
curl https://kots.io/install | bash
Next, let's start the Modzy installation process!
kubectl kots install modzy -n modzy
After a minute or so, your Terminal should say that the Admin Console is available at http://localhost:8800. Open it in your browser so that it can walk you through the installation process.
First, upload your license file
After the pre-flight checks pass, you're presented with the Modzy configuration screen. Choose your installation type and upload your TLS certificate. If you're using a self-signed certificate, be sure to un-check the "Verify TLS" box.
Set your domain name and upload the TLS certificate and private key you generated earlier.
Under storage settings, choose "Generic S3" as the provider and enter the following as the endpoint (assuming you did not edit any of the example files above)
[<http://rook-ceph-rgw-modzy-store.rook-ceph.svc:80
>](http://rook-ceph-rgw-modzy-store.rook-ceph.svc:80`)
Add the access key, secret key, and bucket names in the appropriate fields.
Enter your mail server information.
To create the first administrator user, enter the email address of the person in your identity provider who will automatically be created as the first admin user. That person can then add all further users to Modzy in Modzy's UI.
Finally, check the "Bypass DNS Setup" box. This feature is not supported in on-premise installations.
With all the configuration set, click Continue. If you still have Lens or Infra up and are monitoring the state of Kubernetes, you can watch as Modzy gets installed and configured. Once all the pods have finished launching and report healthy, you can now log in!
Updated 9 months ago