Excelero delivers big data AI storage solutions for business & enterprise, big data storage solutions, and enterprise data storage solutions. Applications are for major web scale companies for data analytics, machine learning applications in media and entertainment and HPC environments. Skip to main content
ExceleroPopular Posts

Designing Server SAN infrastructures with NVMesh & OCP

By March 8, 2017August 31st, 2018No Comments

Ever since its inception, Excelero has been a big supporter of the Open Compute Project, an initiative that was conceptualized to design the world’s most energy efficient data center. Today, the Open Compute Project Foundation is a rapidly growing, global community whose mission is to design, use, and enable mainstream delivery of the most efficient designs for scalable computing.

We were recently invited to test NVMesh on OCP hardware at Facebook’s Disaggregate Lab to see if OCP users could benefit from the performance capabilities – without the data locality issue – of NVMe drives. They set up a lab for us with:

Four OCP Leopard servers, each configured with a dual-port Mellanox ConnectX-4 25Gb Ethernet OCP Mezzanine Adapter and a single Facebook AVA storage adapter with 4 x 800GB Seagate Nytro drives. The RDMA over Converged Ethernet (RoCE) network was created using a single Mellanox 32-port 100Gb switch with a 4x25Gb-to-100Gb breakout cable; only a single connection per server was cabled for this test. Each server was built with a minimum install of CentOS 7.3 (3.10.0-514.2.2.el7.x86_64) and Mellanox OFED 3.4-2.0.0.0.

An important evolution in the design of scale-out data centers is based on the fast innovation in flash storage: as Storagemojo wrote in his latest article “With flash storage, individual SSDs are as fast as most all flash arrays.” All flash arrays are by definition not very open. On the contrary, they are black boxes comprised of HW and SW for which customers pay a big premium because … they are black boxes. Not very open at all.

Until now, customers have been limited in their options to build high-performance primary storage tiers leveraging OCP hardware. The introduction of the Facebook-designed OCP AVA card which can hold 4 x M.2 NVMe SSDs exemplifies what Storagemojo was discussing; it can provide nearly 1M Random read IOPs with common M.2 SSD. in an OCP server sled. As Storagemojo points out in his article, there is one remaining problem: “what if your servers need to share data?” The caveat of NVMe is that it loses its high-performance edge when it is shared across a network. And here is where it gets cool: Excelero’s NVmesh does exactly that.

Our team started the installation day nicely with a 6 hours delay: flights were cancelled in LAX and they had to drive all the way up to Menlo Park. Fortunately, there was enough time for them to have a coffee break while installing the software. I’m sure they had ice cream as well, because all it took them to install NVMesh was less than 30 minutes.

On day two, things would get more serious. The mission was to demonstrate:

  • Interoperability: does NVMesh run smoothly on OCP hardware?
  • Performance: what is the basic performance capability of the test environment and how much latency does NVMesh introduce by sharing NVMe over the network?
  • Pooling NVMe: can NVMe be pooled effectively to avoid data locality?

We scored A’s across the entire tests:
The Seagate Nytro M.2 SSDs on the AVA cards were capable of about 960K Random read IOPs (240K per SSD). Because NVMesh is so low overhead, it was able to easily saturate the 25Gb/s link and maxed out at 899K IOPS – about 27Gb/s worth. Why? Because we striped the volume across all 16 of the SSDs and hence some of the IO was actually local to the server running the benchmark. What this demonstrates is that you can get the same amount of IOPs and latency with remote SSDs as you could get with local. If increased the NIC speed, or used a dual port NIC, you could double your consumable IOPs.

Why is this cool?

Excelero’s NVMesh® enables OCP users to design Server SAN infrastructures for the most demanding enterprise and cloud-scale applications, leveraging OCP hardware and multiple tiers of flash by pooling flash over a network at local speeds. This means you can create logical volumes that are larger than the amount of flash you can fit in a server sled, yet treat the volume as if it was local flash. The primary benefit of NVMesh is that it enables true converged infrastructure by logically disaggregating storage from compute. It bypasses the CPU and avoids noisy neighbors, which is ideal for scale-out applications. This approach provides deterministic performance for applications and enables customers to maximize the utilization of their flash drives.

NVMesh was designed to leverage any underlying storage medium so applications can be provisioned with volumes that meet all application requirements (scale, performance, availability, reliability, efficiency and cost) and guarantee internal or external SLA’s. This can be done from a central interface (or API) that is very transparent and easy to use. So in essence, OCP is cool, but with NVMesh on top it opens a whole new range of possibilities.


Supporting Resources

Yaniv Romem

Author Yaniv Romem

More posts by Yaniv Romem