Local Burst Buffer
A burst buffer is a fast and intermediate storage layer between the non-persistent memory of the compute nodes and persistent storage – the parallel file system. This layer is configured to take a burst of write IO at a very high rate. Once the burst (checkpoint) is complete, the written data is “drained” to the parallel file system, using the GPFS policy engine. This allows checkpoints to finish rapidly so that systems meet availability SLA’s.
When flash storage is used as the burst buffer pool it has the added advantage of facilitating a faster restart (when needed) as checkpoint restarts often impose a very large random read load on the underlying storage. NVMesh provides an extremely cost effective method to achieve unheard of burst buffer bandwidth by adding commodity flash drives and NVMesh software to compute nodes and sharing the storage across the existing low latency network fabric.
It provides redundancy without impacting target CPUs. There is no need for additional dedicated hardware or proprietary file system integrations as storage is provisioned as a simple block device.
NVMesh for Burst Buffer
With NVMesh for burst buffer, you can source standard NVMe drives and can completely obviate the need for proprietary hardware, and even dedicated storage appliances. It builds on the local burst buffer methodology but with a unique advantage: it adds redundancy with centralized management while at the same time preserving all compute resources for the applications themselves.
NVMesh and patented Remote Direct Drive Access (RDDA) technology allow you to logically disaggregate NVMe drives in the compute nodes away from CPU resources.
That is, though the local NVMe drive may be used by remote compute nodes, that usage does not consume local CPU.
Thus, every compute node can have a local NVMe SSD (or multiple drives) and all the drives are pooled for use by the cluster.