Heterogeneous Cluster Support
less than a minute
A llama2-7B
model can be running on 1xA100 GPU, also on 1xA10 GPU, even on 1x4090 and a variety of other types of GPUs as well, that’s what we called resource fungibility. In practical scenarios, we may have a heterogeneous cluster with different GPU types, and high-end GPUs will stock out a lot, to meet the SLOs of the service as well as the cost, we need to schedule the workloads on different GPU types. With the ResourceFungibility in the InftyAI scheduler, we can simply achieve this with at most 8 alternative GPU types.
How to use
Enable InftyAI scheduler
Edit the values.global.yaml
file to modify the following values:
kube-scheduler:
enabled: true
globalConfig:
configData: |-
scheduler-name: inftyai-scheduler
Run make helm-upgrade
to install or upgrade llmaz.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.