My friends and family often ask me what it is we do at Excelero. Most of them are completely unfamiliar with storage. Therefore, I decided to write a blog post that explains the magic of Excelero in simple terminology. Seasoned storage professionals may find this article oversimplified, but there are a lot folks out there who are interested to learn about new storage technologies without caring too much about the difference between IOPS and throughput. So this one is for them!
There are two main approaches to storage: local storage and remote storage. Local storage is simple, think of the disk in your computer. A great example of remote storage are Cloud storage services like Google Drive, Apple iCloud or Dropbox.
Both approaches have benefits and drawbacks:
- Local storage: Reading and writing files is blazing fast. But when the disk is full, you have to replace it with with a bigger one – if that is even possible. If your device gets lost or is stolen, you lose your data if you did not make backup copies. Also, it is difficult to share data with friends.
- Remote storage: Cloud storage services typically let you consume as much storage as you need (scale as you grow), come with a complete package of data services that ensure your data wilt never get lost and enable you to easily share data with friends or family – think pictures. But you don’t really have your data with you: you are dependent on your network connection, which can result in data access times that are up to 100 times slower.
These two storage approaches are not only relevant for home consumers, but also for organizations and businesses: enterprises, research labs, film production companies etc. Local storage puts the required disks directly into a computer that runs an application (e.g. a database); remote storage keeps the computers that run applications (clients) separate from the ones providing storage and data services (targets). This can be in the same data center or truly remote. Again, local storage is faster, remote storage is more scalable, allows data sharing etc. What many of today’s organizations really want, however, is a combination of the two: enjoy the performance of local disk and the scalability and data sharing capabilities that come with remote storage.
Why is this so difficult to achieve? Why does introducing data services, inevitably, cause a massive slowdown?
The network used to be the main reason remote storage is slower than local storage but these days, we have networks capable of communicating at a hundred gigabits per second on a single cable and computers can have multiple, parallel network connections. The real challenge is a little bit more complicated – but I’ll try to keep things simple.
Files that are stored locally on a disk have a 1-to-1 relationship: each file is stored on one computer and only one user is using that file. When you edit a powerpoint presentation, you are the only one working on that file. Things are different for remote files: a spreadsheet on Google Drive or a picture on Facebook is stored on many computers, for backup and redundancy, and can be edited simultaneously by many users. This creates a many-to-many relationship: files are stored on many disks, in different computers and can be accessed by many users/clients.
Such complex many-to-many topologies require what we call a central manager, like a traffic controller. When users attempt to access a file, the traffic controller receives read (open) or write (save) requests and then manages how these requests are processed using the available disks. When, in the middle of all of this, disks break, the traffic controller will also restore the lost data to new disks, run backups, make sure that the entire storage infrastructure runs smoothly and reliably. Needless to say that the traffic controller or central manager can be a major bottleneck.
Imagine a scenario of 1,000 clients writing to 1,000 disks. This yields 1 million (!) different conversations, all converging to a single point. This creates massive challenges:
- The amount of data is huge: just the processing of all those terabytes of data at high speed and dispatching them to disks requires very strong computers.
- The algorithmic burden of data services calculations is monstrous: think of replicating the data for endurance, calculating differences between new and old data, compressing data to save storage space. This is a huge task.
How do storage companies typically handle these challenges?
Managing environments with small numbers of files and few users is not that much of an issue for any storage solution. But when environments need to scale virtually unlimited, things get more interesting and very powerful traffic controllers are required. For decades, computer problems have been solved by building and using more powerful hardware. This wasn’t different for storage. So when storage companies needed more powerful traffic controllers, to manage more storage, accessible to more clients, they would just build more powerful hardware that could manage more requests and process more data. The problem though is that we have come to a point where “throwing more hardware at the problem” is no longer a viable solution, let alone an affordable one.
What is Excelero’s approach?
Excelero is a software company. The storage solutions we help our customers build are using standard components (servers, disks and networking), so expensive hardware controllers do not fit in our vision. However, rather than “solving this controller problem”, we wanted to simply not have the problem by architecting our storage solution in such a ways that no controller is needed. To continue the analogy, we essentially transform clients into law-abiding citizens: by running intelligent software on the clients, there is no need for a central controller as this is done in a distributed way. This approach is a lot more efficient, it is more powerful than the most expensive hardware-based controllers and it is much more scalable: while each new client increases the stress on the system by requesting more data reads and writes, it also brings new resources (compute, memory and networking) that can be used to manage the entire infrastructure.
This distributed architecture is used for many scale-out applications such as distributed navigation apps like Google maps or Waze: each driver increases the load on the system by requesting routes, but the driver’s smartphone adds resources to the overall pool. It can measure average speed, location, resolve ambiguity with other drivers and even map new roads, etc.
So unlike competitors that use expensive hardware to run centralized controllers, Excelero avoids the entire bottleneck problem by using distributed algorithms. There is no need for a traffic controller, and clients can access storage over the network with local performance. As a result, Excelero exceeds the performance capabilities of the fastest storage systems on the market, at a fraction of their cost.