Performance
Feb 17, 2008
Why Networks are Getting Relatively Faster
I previously wrote a blog post about how hard drive performance has been increasing slower than the rate of change of Moore's law which means that hard drives have become relatively much slower over the past 15 years.
Another related and potentially more significant trend is that fiber bandwidth speeds have been increasing faster than Moore's Law over the past several years. Although the data on this trend is much harder to find, I did find this reference at the Moore's Law article on Wikipedia:
Data per optical fiber. According to Gerry/Gerald Butters,[15][16] the former head of Lucent's Optical Networking Group at Bell Labs, there is another version, called Butter's Law of Photonics,[17] a formulation which deliberately parallels Moore's law. Butter's Law [18] says that the amount of data coming out of an optical fiber is doubling every nine months. Thus, the cost of transmitting a bit over an optical network decreases by half every nine months. The availability of wavelength-division multiplexing (sometimes called "WDM") increased the capacity that could be placed on a single fiber by as much as a factor of 100. Optical networking and DWDM is rapidly bringing down the cost of networking, and further progress seems assured. As a result, the wholesale price of data traffic collapsed in the dot-com bubble
I ran into Marc Verdiell who is a world-class fiber data communication expert and was able to ask him whether he was seeing such a dramatic shift in fiber bandwidth and he confirmed that the bandwidth potential for fiber is effectively unlimited as there are many, many colors of light that you can theoretically transmit down a piece of fiber. (Of course, the technology to make this happen gets extremely complex which will keep experts like Marc busy for many years.)
I am seeking to understand this further, but these initial pieces of information suggest that we are in for continued, dramatic increases in network bandwidth in the coming years -- at a rate faster than the doubling-every-two-years realized pace of Moore's law.
When combined with the long term trend that hard drive performance is increasing at a rate slower than Moore's Law, the clear result is that high performance and large scale data storage systems will evolve to designs based on reading to and writing from large numbers of hard drives simultaneously over high speed network connections.
Chris
Jan 24, 2008
Even Faster Actual Performance
Performance benchmarks with the latest internal release (0.8.0) show that the read/write performance of Dispersed Storage went up by 2x (again!). We're now generally seeing realized throughput rates in the 20-30 MBps range (equal to 160-320 Mpbs) though a single Accesser (client) on a dsNet. This level of performance is well beyond the theoretical maximum that we thought we'd get to in this initial release. (So it is a good thing that our developers are better at performance improvements than they predicting the theotical maximum for performance!)
To put that in perspective, we ran some apple-to-apples comparisons between a dsNet and a local hard drive. The test we ran was reading and writing a 1 GB file on both a local (desktop PC) hard drive and a dsNet over a 1 Gbps connection. It turns out that the dsNet was a bit faster for the write and a bit slower for the the read vs. a local hard drive. The results are in this chart:
Overall, this is a huge deal. This level of performance is about 100x where we thought we'd end up for this release. Because we are now providing hard drive level performance through regular hard drive interfaces (Block, iSCSI, etc.), we really are optomisic about the potential for Dispersed Storage.
Going forward, we are confident that we'll be able to increase the performance of dsNets even further and ultimately consistently exceed hard drive performance.
Chris
Dec 28, 2007
Information Dispersal Performance
Our focus for Dispersed Storage performance has been overall system performance – which in a Dispersed Storage Network is driven by how quickly bytes travel up and down the client stack. This stack performance is basically what determines the speed of getting data on or off the dsNet, i.e. the speed of reads and writes. The sequence of events when going down the stack -- “writing” data is:
1. Source data integrity check
2. Compression
3. Encryption
4. Dispersal
5. Slice integrity check
6. Packetizing
7. Network transmission
When “reading” the data flow is the opposite – you just go up the stack. Also, note that compression and encryption are optional.
So, our thinking about performance revolves around the overall throughput which means that the slowest layer has the greatest affect on throughput. Our goal for the performance of the dispersal algorithms is to make sure they are not the slow link in the performance chain required to get data through the performance stack.
We’ve been running a variety of performance tests – reads, writes, different files sizes, etc. and are consistently seeing that the dispersal algorithms are now exceeding our goal for performance in this release. Specifically, on a Xeon 5300 Quad Core running at 2.33 GHz, we are getting data through the Dispersed Storage stack (with compression and encryption turned off, so that we are mainly exercising the dispersal algorithms) generally around 100 Mbps (or 12 MBps) for reads or writes.
We’ll have more test data in the future, but we’re very pleased that these results are looking so good so far. And we still expect to realize further performance gains over time with further optimization.
In our previous tests of compression and encryption performance, we saw that either of these two functions are much more CPU-intensive than dispersal. So, we are confident that our implementation of information dispersal algorithms won’t be the slow layer in the stack. We are now re-testing with compression and/or encryption turned to confirm these prior results.
For even higher performance, you can run the Dispersed Storage stack on multiple machines or front-end the dispersal stack with other processes, like compression or de-duplication to increased the realized data throughput. For example, if you had a process running on another server that provided 10:1 compression and you dispersed the compressed data, you could realize source data throughputs of 1+ Gbps through a single quad core box running the Dispersed Storage stack. And if you ran the compression and Dispersal stacks processes on multiple servers in parallel, then overall performance just scales up. I think you see where this is headed.
The summary idea is that performance in a Dispersed Storage network doesn’t have an inherent limitation – just like a packet switched network doesn’t have an inherent performance limitation. We have a lot of work to do to fully realize this vision, but asking “how fast is a Dispersed Storage Network” should be like asking “how fast is packet switching.”
Chris



