In this short tutorial I will explain how to run a Docker container with GPU support. This becomes important if you want to run AI applications such as the Imixs-AI project within a local Docker environment or in a Kubernetes Cluster. This short tutorial is based on Debian 12 (Bookworm) and the usage of a NVIDA graphic card. I assume that you already have installed Debian and the Docker engine. Read the official install guide about how to install the Docker engine on Debian.
Install NVIDIA Driver
To run Containers with GPU support you first need of course a graphic card running on your server. In the following section I show how install the proprietary NVIDIA Drivers with the Cuda-Toolkit on Debian Bookworm. The Cuda toolkit includes GPU-accelerated libraries to run high-performance, GPU-accelerated applications. The Cuda Framework is widely used, especially in combination with AI projects like LLMA-CPP. There are also open source drivers available, but we have not tested them.
To install the NVIDA driver you frist need to update your repo sources:
$ sudo apt update
$ sudo apt upgrade
# Remove previous installed nvida drivers (optional)
$ sudo apt autoremove nvidia* --purge
# Enable Contrib and Non-Free Repositories on Debian
$ sudo apt install software-properties-common -y
$ sudo add-apt-repository contrib non-free-firmware
Next Import the Nvidia APT Repository. This repo allows access to additional Nvidia tools like nvida-smi
$ sudo apt install dirmngr ca-certificates software-properties-common apt-transport-https dkms curl -y
$ sudo curl -fSsL https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub | sudo gpg --dearmor | sudo tee /usr/share/keyrings/nvidia-drivers.gpg > /dev/null 2>&1
echo 'deb [signed-by=/usr/share/keyrings/nvidia-drivers.gpg] https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /' | sudo tee /etc/apt/sources.list.d/nvidia-drivers.list
$ sudo apt update
Finally Install Nvidia Drivers on Debian via DEFAULT APT Repository. Here I assume you have running a 64-bit system.
$ sudo apt update
$ sudo apt install linux-headers-amd64 nvidia-detect
Now you can check the Nvidia support on your server with the nvida-detect command. This will give you hints about the hardware status and how to install the driver:
$ nvidia-detect
Detected NVIDIA GPUs:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
Checking card: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
Your card is supported by all driver versions.
Your card is also supported by the Tesla drivers series.
Your card is also supported by the Tesla 470 drivers series.
It is recommended to install the
nvidia-driver
package.
The output reveals that the machine features a GeForce GTX 1080 card and recommends installing the nvidia-driver package. Now you can finally install the recommended package together with the cuda toolkit.
$ sudo apt install nvidia-driver nvidia-smi linux-image-amd64 cuda
I install in addition the Nvida Service-Management-Interface which allows you to verify your installation. Finally reboot your system.
Note: it may happen that you need a hard reset on your machine. This may happen in case of conflicts with existing drivers.
Verify Installation
To verify your installation run nvidia-smi
which shows you some insights of your environment:
$ nvidia-smi
Sun Mar 31 10:46:20 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: N/A |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
| 36% 42C P0 39W / 180W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Configuring Docker with GPU Support
To get things done in addition it is necessary to install the ‘NVIDIA Container Toolkit’. This will add the GPU support for your Docker engine. (This step is not necessary for Kubernetes – see section below)
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey |sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
&& sudo apt-get update
$ sudo apt-get install -y nvidia-container-toolkit
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker
Test your setup with
$ nvidia-container-cli -k -d /dev/tty info
This should not show any errors. Now you can start a test container to verify your new Docker GPU support:
docker run --gpus all nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
This test container should just show the nvidia-smi output form above.
That’s it! Now you have a local Docker Environment with GPU Support.
Kubernetes Integration
If you plan to run you container in a Kubernetes Cluster things become even more easy. This is surprising since Kubernetes typically has a steep learning curve. But NVIDA did a great job with its so called NVIDIA GPU Operator.
To install the operator into your Kubernetes cluster, first you need to install the cli tool Helm. If not yet installed on your Kubernetes Master Node run:
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
&& chmod 700 get_helm.sh \
&& ./get_helm.sh
Next add the NVIDA Helm repository
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
&& helm repo update
This operator will add the GPU support to all worker nodes in your cluster. To exclude a worker node this node have to be labeled with feature.node.kubernetes.io/pci-10de.present=false
. You can check the current labels of your nodes with:
$ kubectl get nodes --show-labels=true
and add the exclusion label for a specify node with:
$ kubectl label nodes $NODE nvidia.com/gpu.deploy.operands=false
Where `$NODE` is the name of the worker node to be labeled.
Install The GPU Operator
Now you can install the GPU Operator into your cluster. There are different szenarios how to install the NVIDIA GPU Operator. To install the operator including the installation of the NVIDA Driver run:
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
The installation of all components in your cluster make take some time. So be not to impatient…
If you have the NVIDA Driver already installed on your worker node you can run the helm chart without installing the driver by the operator:
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false
If something goes wrong with the installation you can delete the operator with:
$ helm delete -n gpu-operator $(helm list -n gpu-operator | grep gpu-operator | awk '{print $1}')
That’s it.
To define a deployment using the GPU support you simply need to add the resource limits nvidia.com/gpu: 1
This is an example of a deployment for the Imixs-AI-CPP Container with GPU support in a kuberentes cluster:
kind: Deployment
apiVersion: apps/v1
metadata:
name: imixs-ai-llama-cpp-gpu
labels:
app: imixs-ai-llama-cpp-gpu
spec:
replicas: 1
selector:
matchLabels:
app: imixs-ai-llama-cpp-gpu
template:
metadata:
labels:
app: imixs-ai-llama-cpp-gpu
spec:
containers:
- name: imixs-ai-llama-cpp-gpu
image: imixs/imixs-ai-llama-cpp-gpu:latest
imagePullPolicy: Always
env:
- name: PYTHONUNBUFFERED
value: "1"
- name: USE_MLOCK
value: "true"
ports:
- name: web
containerPort: 8000
resources:
limits:
nvidia.com/gpu: 1
volumeMounts:
- mountPath: /models
name: imixs-ai-models
restartPolicy: Always
volumes:
- name: imixs-ai-models
persistentVolumeClaim:
claimName: imixs-ai-models