Dispersed Storage Network
Jan 24, 2008
Even Faster Actual Performance
Performance benchmarks with the latest internal release (0.8.0) show that the read/write performance of Dispersed Storage went up by 2x (again!). We're now generally seeing realized throughput rates in the 20-30 MBps range (equal to 160-320 Mpbs) though a single Accesser (client) on a dsNet. This level of performance is well beyond the theoretical maximum that we thought we'd get to in this initial release. (So it is a good thing that our developers are better at performance improvements than they predicting the theotical maximum for performance!)
To put that in perspective, we ran some apple-to-apples comparisons between a dsNet and a local hard drive. The test we ran was reading and writing a 1 GB file on both a local (desktop PC) hard drive and a dsNet over a 1 Gbps connection. It turns out that the dsNet was a bit faster for the write and a bit slower for the the read vs. a local hard drive. The results are in this chart:
Overall, this is a huge deal. This level of performance is about 100x where we thought we'd end up for this release. Because we are now providing hard drive level performance through regular hard drive interfaces (Block, iSCSI, etc.), we really are optomisic about the potential for Dispersed Storage.
Going forward, we are confident that we'll be able to increase the performance of dsNets even further and ultimately consistently exceed hard drive performance.
Chris
Dec 28, 2007
Information Dispersal Performance
Our focus for Dispersed Storage performance has been overall system performance – which in a Dispersed Storage Network is driven by how quickly bytes travel up and down the client stack. This stack performance is basically what determines the speed of getting data on or off the dsNet, i.e. the speed of reads and writes. The sequence of events when going down the stack -- “writing” data is:
1. Source data integrity check
2. Compression
3. Encryption
4. Dispersal
5. Slice integrity check
6. Packetizing
7. Network transmission
When “reading” the data flow is the opposite – you just go up the stack. Also, note that compression and encryption are optional.
So, our thinking about performance revolves around the overall throughput which means that the slowest layer has the greatest affect on throughput. Our goal for the performance of the dispersal algorithms is to make sure they are not the slow link in the performance chain required to get data through the performance stack.
We’ve been running a variety of performance tests – reads, writes, different files sizes, etc. and are consistently seeing that the dispersal algorithms are now exceeding our goal for performance in this release. Specifically, on a Xeon 5300 Quad Core running at 2.33 GHz, we are getting data through the Dispersed Storage stack (with compression and encryption turned off, so that we are mainly exercising the dispersal algorithms) generally around 100 Mbps (or 12 MBps) for reads or writes.
We’ll have more test data in the future, but we’re very pleased that these results are looking so good so far. And we still expect to realize further performance gains over time with further optimization.
In our previous tests of compression and encryption performance, we saw that either of these two functions are much more CPU-intensive than dispersal. So, we are confident that our implementation of information dispersal algorithms won’t be the slow layer in the stack. We are now re-testing with compression and/or encryption turned to confirm these prior results.
For even higher performance, you can run the Dispersed Storage stack on multiple machines or front-end the dispersal stack with other processes, like compression or de-duplication to increased the realized data throughput. For example, if you had a process running on another server that provided 10:1 compression and you dispersed the compressed data, you could realize source data throughputs of 1+ Gbps through a single quad core box running the Dispersed Storage stack. And if you ran the compression and Dispersal stacks processes on multiple servers in parallel, then overall performance just scales up. I think you see where this is headed.
The summary idea is that performance in a Dispersed Storage network doesn’t have an inherent limitation – just like a packet switched network doesn’t have an inherent performance limitation. We have a lot of work to do to fully realize this vision, but asking “how fast is a Dispersed Storage Network” should be like asking “how fast is packet switching.”
Chris



