Introduction
The NVMesh CSI Driver Topology feature allows a single CSI driver to manage multiple clusters of NVMesh within a single Kubernetes environment.
The driver topology feature ensures that each pod using a NVMesh-based PVC will only be scheduled on nodes where the volume is accessible from the NVMesh client.
When the topology feature is configured, each NVMesh cluster will be represented as an NVMesh CSI zone.
The driver automatically adds a label on each node in the format nvmesh-csi.excelero.com/zone=<zone name>
to have Kubernetes associate each node with a cluster or zone.
The configuration of zones is configured by the administrator in the nvmesh-csi-driver-config
ConfigMap. The driver will discover all nodes for z given zone by querying the NVMesh management servers configured for that zone and will save this topology in a new ConfigMap named nvmesh-csi-topology
, This ConfigMap should not be modified by the user. When a volume is created, the driver will add nodeAffinity
to the PersistentVolume
with the zone label to let the Kubernetes scheduler know that all future pods using this PVC should be scheduled only on nodes in the same zone as the NVMesh cluster where the volume was provisioned.
Configuration
To inform the CSI driver of the available zones add the topology
field to the nvmesh-csi-driver-config
ConfigMap.
Following is an example with a list of all available options.
kind: ConfigMap
apiVersion: v1
metadata:
name: nvmesh-csi-driver-config
data:
management.protocol: https
management.servers: 10.0.1.117:4000
attachIOEnabledTimeout: "30"
topology: |-
{
"zones": {
"zone_A": {
"management": {
"servers": "worker1.domain.com:4000"
}
},
"zone_B": {
"management": {
"servers": "worker4.domain.com:4000"
}
}
}
}
The topology field is a JSON with a single zones
key, which contains the configuration for each zone.
Each key in the zones
object is a name of a zone and the value provides the zone configuration parameters.
For each zone configuration, the following fields are available:
Field | Description |
---|---|
management |
Configuration for the management server in this specific zone |
management.servers |
A comma-separated list of management servers addresses in the format address:port , for instance management-1:4000,management-2:4000 |
management.protocol |
The management server protocol, i.e. “http” or “https” |
management.user |
The management user to login with, for instance “admin@excelero.com” |
management.password |
The management password, for instance “admin” |
Creating Volumes and Pods
Create a PVC and a Pod
Create a StorageClass
with volumeBindingMode: WaitForFirstConsumer
.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: nvmesh-with-topology
provisioner: nvmesh-csi.excelero.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
vpg: DEFAULT_CONCATENATED_VPG
Create a PVC using this StorageClass
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: topology-volume0
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
storageClassName: nvmesh-wait-for-consumer
Create a Pod that uses the PVC
apiVersion: v1
kind: Pod
metadata:
name: topology-pod0
spec:
serviceAccountName: topology-aware
containers:
- name: nginx
image: gcr.io/google_containers/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
persistentVolumeClaim:
claimName: topology-volume0
Assign the PVC / Pod to a zone using a StorageClass with the topology field
To create volumes on a specific NVMesh cluster, create a StorageClass
with the allowedTopologies
field.
When a PVC is created from a StorageClass with this field, the CSI driver will create the volume on the desired zone.
Multiple allowedTopologies
If multiple zones are allowed, as in the example below, the CSI driver will randomly pick one of the zones and create the volume on that zone.
The PersistentVolume
will then be accessible only on the selected zone and every pod with the same PVC will only be scheduled to that selected zone.
Different PVCs created from the same storageClass may be in different zones.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: nvmesh-with-topology
provisioner: nvmesh-csi.excelero.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
vpg: DEFAULT_CONCATENATED_VPG
allowedTopologies:
- matchLabelExpressions:
- key: nvmesh-csi.excelero.com/zone
values:
- zone_A
- zone_B
Assign a PVC or Pod to a zone using the Pod’s nodeAffinity
It is possible to set the nodeAffinity directly on the pod. The PVC and the pod will then be created in the desired zone. In this case, the PVC should use a StorageClass with volumeBindingMode: WaitForFirstConsumer
.
apiVersion: v1
kind: Pod
metadata:
name: topology-pod0
spec:
serviceAccountName: topology-aware
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvmesh-csi.excelero.com/zone
operator: In
values:
- zone_A
- zone_B
containers:
- name: nginx
image: gcr.io/google_containers/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
persistentVolumeClaim:
claimName: topology-volume0
For a more complex example with StatefulSet, Multiple Zone and antiAffinity on zones, see Topology-Aware Volume Provisioning in Kubernetes
PVC with volumeBindingMode: Immediate
When a PVC with volumeBindingMode: Immediate
is created, the NVMesh CSI Driver will randomly pick a zone and provision the volume on that zone.
All subsequent pods using this PVC will be scheduled to this zone.
References
For additional details on VolumeBindingMode, see k8s Documentation – VolumeBindingMode
For additional details on AllowedTopologies, see k8s Documentation – AllowedTopologies
Post your comment on this topic.