Scalability
Jun 03, 2008
How “Wide” will dsNets get?
As the size of Dispersed Storage Networks in operation grow larger, what is the optimum maximum number of Slices for each data element stored on the dsNet?
The “Width” of a “Vault” on a dsNet determines the number of dispersed Slices that are physically stored for each data element in that vault. For example, a typical block interface typically uses 4K blocks, so the dsNet block interface presents an interface that stores and retrieves 4K blocks. (The dsNet iSCSI target uses this block interface.)
When storing a 4K block, the dsNet creates a number of slices equal to the width set for the associated vault on the dsNet. Each of these slices are then typically stored on different Slicestors. Most of the initial dsNets used 8 Slicestors and set the width of each vault to 8. Although the information dispersal algorithms we use can mathematically support any width (and our implementation can be configured with widths up to 256), the initial commercial release only supported 8 as the width since that was a width for which we had completed comprehensive testing. We’re now beginning comprehensive testing with 16 wide dsNets and then expect to test wider widths: 32, 64, maybe 128 or higher.
In smaller dsNets, it is common for the width of the vault(s) on the dsNet to equal the number of Slicestors, but as dsNets grow the number of Slicestors will exceed the width of any vault on that dsNet.
Many dsNets will store and distribute at least Petabytes of data using more thousands or more Slicestors and will often use different widths (and threshold settings) for various vaults stored on those dsNets. We expect that the widths for those vaults will vary from 8 to 64 or perhaps 128, but will be significantly less than the total number of Slicestors on the dsNet.
Chris
Feb 17, 2008
Why Networks are Getting Relatively Faster
I previously wrote a blog post about how hard drive performance has been increasing slower than the rate of change of Moore's law which means that hard drives have become relatively much slower over the past 15 years.
Another related and potentially more significant trend is that fiber bandwidth speeds have been increasing faster than Moore's Law over the past several years. Although the data on this trend is much harder to find, I did find this reference at the Moore's Law article on Wikipedia:
Data per optical fiber. According to Gerry/Gerald Butters,[15][16] the former head of Lucent's Optical Networking Group at Bell Labs, there is another version, called Butter's Law of Photonics,[17] a formulation which deliberately parallels Moore's law. Butter's Law [18] says that the amount of data coming out of an optical fiber is doubling every nine months. Thus, the cost of transmitting a bit over an optical network decreases by half every nine months. The availability of wavelength-division multiplexing (sometimes called "WDM") increased the capacity that could be placed on a single fiber by as much as a factor of 100. Optical networking and DWDM is rapidly bringing down the cost of networking, and further progress seems assured. As a result, the wholesale price of data traffic collapsed in the dot-com bubble
I ran into Marc Verdiell who is a world-class fiber data communication expert and was able to ask him whether he was seeing such a dramatic shift in fiber bandwidth and he confirmed that the bandwidth potential for fiber is effectively unlimited as there are many, many colors of light that you can theoretically transmit down a piece of fiber. (Of course, the technology to make this happen gets extremely complex which will keep experts like Marc busy for many years.)
I am seeking to understand this further, but these initial pieces of information suggest that we are in for continued, dramatic increases in network bandwidth in the coming years -- at a rate faster than the doubling-every-two-years realized pace of Moore's law.
When combined with the long term trend that hard drive performance is increasing at a rate slower than Moore's Law, the clear result is that high performance and large scale data storage systems will evolve to designs based on reading to and writing from large numbers of hard drives simultaneously over high speed network connections.
Chris
Nov 01, 2007
Out of the Box
Don’t get me wrong, getting old is not fun, but it does afford a certain perspective that only comes with being around for a while. That’s sort of a good thing, I guess. Being in one industry for a while you can definitely see innovation and progression, however, there are relatively few times when you get to witness revolutionary change. Most of the time things tend to evolve…evolutionary change.
It would be hard to argue that the Information Technology and Data Processing has experienced significant change in the last 30 years. There are countless examples of things we take for granted today that weren’t even conceived of a generation ago.
The face of the IT industry has changed and changed again in that time frame. When I entered the work force, all computing and data processing was performed by expansive systems that took up rooms of floor space, had to be cooled by water flow and had about the same processing capacity and memory as the laptop that I’m using today. You had large expansive mainframe boxes that performed batch oriented jobs and very few humans interacted with computers at all.
With the invention of the personal computer and client-server computing, IT and data processing broke out of the “glass house” and became the prevue of every human. Networking advances connected all this computing power together and placed information, entertainment, and just about everything in our lives at our fingertips. That’s revolutionary, it’s changed the world.
In storage, there have been significant changes as well, but one could argue again, that most of the changes have been evolutionary, not revolutionary. Driven by areal densities, speed and recording capabilities of magnetic material and manufacturing efficiencies, the storage industry has seen its share of improvements but not at the level of other core elements of IT.
Look at today’s choices in the storage system market! Probably the most significant invention in storage was RAID technology at the end of the last century. Since then, most of the invention you’ve seen from the storage manufacturers is in the area of connectivity, capacity and performance. Every storage vendor basically solves the problem the same way. They build a box (an array) that is a collection of heads/controllers with interfaces (where most of the intelligence lies), a set of drives that operate collectively to provide capacity, performance and recovery from failure and some form of management software that is essential to operate and manage this stuff. It’s not very exciting and not much invention - except some of the management techniques.
What we need is some revolutionary thinking, “outside the box” thinking similar to the days when data processing escaped the monolithic compute platforms and became accessible and usable by the majority. Why should all critical information assets be stored on one box in one location, even if it’s copied to another box or tape cartridge for safe-keeping? Why shouldn’t information be stored in such a way that it’s not only private and protected without replication but it’s easily accessible and distributable to humans that want to use it?


