Overview
Processing engines
Every account has a running capacity, given by the number of processing engines available to run jobs. Each engine processes one input item at a time. The number of engines available determines the maximum amount of input items a job can process in parallel. Inputs not being processed stand by in the input queue until an engine picks them up.
Set a model’s version processing capacity to manage the number of processing engines the model can use from the account. If all the processing engines are being used to run models, new job requests hold in the queue until engines become available again.
Nodes
Nodes are virtual or physical machines that have resources available to run processing engines. Resources include CPU, GPU, memory, and a maximum number of processing engines that can be scheduled onto the node. Check out Kubernetes Docs for more details. The resources available on a node include:
The processing object
{
"minimumParallelCapacity": 1,
"maximumParallelCapacity": 3
}
minimumParallelCapacity | number | The minimum number of processing engines a model’s version can run. It is a positive integer. |
---|---|---|
maximumParallelCapacity | number | The maximum number of processing engines a model’s version can run. It is a positive integer. |
The engines object
{
"name" : "...",
"createdAt" : "...",
"ready" : true
},
name | string | The engine’s name. |
---|---|---|
createdAt | string | The engines’s creation date in ISO8601 (YYYY-MM-DDThh:mm:ss.sTZD) format. |
ready | boolean | The engine’s status. |
The model deployment state object
{
"hasError": false,
"ready": true,
"beingMonitored": true
}
hasError | boolean | When true, an error doesn’t allow the engine to start. Modzy still tries to spin it up. |
---|---|---|
ready | boolean | When true, the engine is ready to process inputs. |
beingMonitored | boolean | When true, the API is continuously checking the engine’s status. |
The nodes object
{
"name": "...",
"creationTimestamp": "...",
"annotation": [
{
"key": "...",
"value": "..."
}
],
"labels": [
{
"key": "...",
"value": "..."
}
],
"status": {
"allocatable": {
"memory": "...",
"cpu": "...",
"gpu": "...",
"pods": 10
},
"capacity": {
"memory": "...",
"cpu": "...",
"gpu": "...",
"pods": 10
},
"conditions": [],
"images": []
}
}
]
name | string | A node’s name. Nodes may be virtual or physical machines. |
---|---|---|
creationTimestamp | string | The time when the node was added. |
annotation | array | A key-value pair with node metadata. |
labels | array | A key-value pair that tags nodes. |
status | object | An object that contains the node’s status. |
The node status object
{
"allocatable": {
"memory": "...",
"cpu": "...",
"gpu": "...",
"pods": 10
},
"capacity": {
"memory": "...",
"cpu": "...",
"gpu": "...",
"pods": 10
},
"conditions": [],
"images": []
}
allocatable | object | A node’s amount of available resources allocatable to processing engines. |
---|---|---|
capacity | object | A node’s total amount of resources. |
conditions | array | Describes the status of all running nodes. Conditions include Ready, DiskPreassure, MemoryPressure, PIDPressure, and NetworkUnavailable. |
images | array | The name and size of the containers required to run an application. |