An introduction to Excelero’s Engineering Headquarters in Tel Aviv

My name is Daniel and I am heir to a long dynasty of geeks. I also like to write so I have volunteered to provide engineering-sourced content for the Excelero blog. I will be keeping you up to date with news from Excelero’s engineering HQ in Tel Aviv: interesting projects we are working on, new developments, or just things we, geeks, like to talk about.

Before we dig into the technology, let me introduce to you the Excelero engineering teams. At Excelero, we develop high-speed low-latency distributed storage and we have a team for each component of the system.

The Data Services Team develops the module that presents block devices (volumes) to the kernel and executes IO requests. All fancy services like protection RAID, erasure coding, thin provisioning, etc., are developed here. By the way, we denote the machine that issues IO as client and the machines which hold physical NVMe disks as targets. Naturally, most of our data services are developed on the client side (to avoid target bottlenecks). We write in plain C and mostly deal with algorithmic and speed optimization challenges.

Somehow IO has to travel from multiple clients to targets. The Transport / Core Team, or core engineers, are in charge of exactly that. They support different fabrics (ROCE, IB), NICs, disks drivers, etc. There is a clear API between Data Services and Transport Layer. An example of some API functions are: take_lock(), send_write_cmd(), send_read_cmd(), release_lock(). The Core Team is focused on architectural / system design issues and consists mainly of seasoned kernel developers.

Clients and Targets have a nature of many-to-many relations. The job of Toma (Cluster manager) is to coordinate them all. Toma is the responsibility of the Topology manager (Toma) Team.  I dare say that for Excelero, the cluster manager is by far more complicated than those typically described in literature and perhaps in other solutions. That’s because our Toma does not control the data path. When a client issues IO to a target, the client’s CPU is the only one who ever knows and handles the IO. Toma cannot filter, stop or redirect the IO flow. Topology manager developers are dealing with both algorithmic and system issues.

The Management/UI Team is in charge of the user interface and configurations. This includes components like creating configuration of volumes (how each volume is allocated on groups of physical disks), supplying reliable statistics and graphical monitoring over the system, installation and version upgrades.

The Automation and QA Teams are our last line of defense against bugs, glitches and evil data corruption. They are working hard to test the system in various conditions (loads, scale, stress) to create scenarios as close as possible to the customers’ use cases.

By Daniel Herman Shmulyan: Data Services Team leader (LinkedIn)