IDA
Dec 28, 2007
Information Dispersal Performance
Our focus for Dispersed Storage performance has been overall system performance – which in a Dispersed Storage Network is driven by how quickly bytes travel up and down the client stack. This stack performance is basically what determines the speed of getting data on or off the dsNet, i.e. the speed of reads and writes. The sequence of events when going down the stack -- “writing” data is:
1. Source data integrity check
2. Compression
3. Encryption
4. Dispersal
5. Slice integrity check
6. Packetizing
7. Network transmission
When “reading” the data flow is the opposite – you just go up the stack. Also, note that compression and encryption are optional.
So, our thinking about performance revolves around the overall throughput which means that the slowest layer has the greatest affect on throughput. Our goal for the performance of the dispersal algorithms is to make sure they are not the slow link in the performance chain required to get data through the performance stack.
We’ve been running a variety of performance tests – reads, writes, different files sizes, etc. and are consistently seeing that the dispersal algorithms are now exceeding our goal for performance in this release. Specifically, on a Xeon 5300 Quad Core running at 2.33 GHz, we are getting data through the Dispersed Storage stack (with compression and encryption turned off, so that we are mainly exercising the dispersal algorithms) generally around 100 Mbps (or 12 MBps) for reads or writes.
We’ll have more test data in the future, but we’re very pleased that these results are looking so good so far. And we still expect to realize further performance gains over time with further optimization.
In our previous tests of compression and encryption performance, we saw that either of these two functions are much more CPU-intensive than dispersal. So, we are confident that our implementation of information dispersal algorithms won’t be the slow layer in the stack. We are now re-testing with compression and/or encryption turned to confirm these prior results.
For even higher performance, you can run the Dispersed Storage stack on multiple machines or front-end the dispersal stack with other processes, like compression or de-duplication to increased the realized data throughput. For example, if you had a process running on another server that provided 10:1 compression and you dispersed the compressed data, you could realize source data throughputs of 1+ Gbps through a single quad core box running the Dispersed Storage stack. And if you ran the compression and Dispersal stacks processes on multiple servers in parallel, then overall performance just scales up. I think you see where this is headed.
The summary idea is that performance in a Dispersed Storage network doesn’t have an inherent limitation – just like a packet switched network doesn’t have an inherent performance limitation. We have a lot of work to do to fully realize this vision, but asking “how fast is a Dispersed Storage Network” should be like asking “how fast is packet switching.”
Chris


