Guides

How to get started, and accomplish tasks, using Kubernetes.

Edit This Page

GPU Support

Kubernetes includes experimental support for managing NVIDIA GPUs spread across nodes. This page describes how users can consume GPUs and the current limitations.

Pre-requisites

  1. Kubernetes nodes have to be pre-installed with Nvidia drivers. Kubelet will not detect Nvidia GPUs otherwise.
  2. A special alpha feature gate Accelerators has to be set to true across the system: --feature-gates="Accelerators=true".
  3. Nodes must be using docker engine as the container runtime.

The nodes will automatically discover and expose all Nvidia GPUs as a schedulable resource.

API

Nvidia GPUs can be consumed via container level resource requirements using the resource name alpha.kubernetes.io/nvidia-gpu.

kind: pod
apiVersion: v1
spec:
 containers:
 - name: gpu-container-1
   resources:
	   limits:
		alpha.kubernetes.io/nvidia-gpu: 2
 - name: gpu-container-2
   resources:
    limits:
	 alpha.kubernetes.io/nvidia-gpu: 3

If your nodes are running different versions of GPUs, then use Node Labels and Node Selectors to schedule pods to appropriate GPUs. Following is an illustration of this workflow:

As part of your Node bootstrapping, identify the GPU hardware type on your nodes and expose it as a node label.

NVIDIA_GPU_NAME=$(nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0)
source /etc/default/kubelet
KUBELET_OPTS="$KUBELET_OPTS --node-labels='alpha.kubernetes.io/nvidia-gpu-name=$NVIDIA_GPU_NAME'"
echo "KUBELET_OPTS=$KUBELET_OPTS" > /etc/default/kubelet

Specify the GPU types a pod can use via Node Affinity rules.

kind: pod
apiVersion: v1
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/affinity: >
      {
        "nodeAffinity": {
          "requiredDuringSchedulingIgnoredDuringExecution": {
            "nodeSelectorTerms": [
              {
                "matchExpressions": [
                  {
                    "key": "alpha.kubernetes.io/nvidia-gpu-name",
                    "operator": "In",
                    "values": ["Tesla K80", "Tesla P100"]
                  }
                ]
              }
            ]
          }
        }
      }
spec:
 containers:
 - name: gpu-container-1
   resources:
    limits:
	 alpha.kubernetes.io/nvidia-gpu: 2

This will ensure that the pod will be scheduled to a node that has a Tesla K80 or a Tesla P100 Nvidia GPU.

Warning

The API presented here will change in an upcoming release to better support GPUs, and hardware accelerators in general, in Kubernetes.

Access to CUDA libraries

As of now, CUDA libraries are expected to be pre-installed on the nodes.

Pods can access the libraries using hostPath volumes.

kind: Pod
apiVersion: v1
metadata:
  name: gpu-pod
spec:
  containers:
  - name: gpu-container-1
    securityContext:
      privileged: true
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
    volumeMounts:
    - mountPath: /usr/local/nvidia/bin
      name: bin
    - mountPath: /usr/lib/nvidia
      name: lib
  volumes:
  - hostPath:
      path: /usr/lib/nvidia-367/bin
    name: bin
  - hostPath: 
      path: /usr/lib/nvidia-367
    name: lib

Future

Analytics

Create an Issue Edit this Page