Information Dispersal Algorithm
Jun 03, 2008
How “Wide” will dsNets get?
As the size of Dispersed Storage Networks in operation grow larger, what is the optimum maximum number of Slices for each data element stored on the dsNet?
The “Width” of a “Vault” on a dsNet determines the number of dispersed Slices that are physically stored for each data element in that vault. For example, a typical block interface typically uses 4K blocks, so the dsNet block interface presents an interface that stores and retrieves 4K blocks. (The dsNet iSCSI target uses this block interface.)
When storing a 4K block, the dsNet creates a number of slices equal to the width set for the associated vault on the dsNet. Each of these slices are then typically stored on different Slicestors. Most of the initial dsNets used 8 Slicestors and set the width of each vault to 8. Although the information dispersal algorithms we use can mathematically support any width (and our implementation can be configured with widths up to 256), the initial commercial release only supported 8 as the width since that was a width for which we had completed comprehensive testing. We’re now beginning comprehensive testing with 16 wide dsNets and then expect to test wider widths: 32, 64, maybe 128 or higher.
In smaller dsNets, it is common for the width of the vault(s) on the dsNet to equal the number of Slicestors, but as dsNets grow the number of Slicestors will exceed the width of any vault on that dsNet.
Many dsNets will store and distribute at least Petabytes of data using more thousands or more Slicestors and will often use different widths (and threshold settings) for various vaults stored on those dsNets. We expect that the widths for those vaults will vary from 8 to 64 or perhaps 128, but will be significantly less than the total number of Slicestors on the dsNet.
Chris
Oct 15, 2007
The Cleversafe Idea
“How did you get the initial idea for Cleversafe?” is a question I am asked fairly often possibly since I don’t have a long career background in data storage systems or in coding algorithms. What lead me to that idea of building a geographically distributed Dispersed Storage grids was that I was looking for a way to store my personal data and I had been reading a lot about the history of cryptography.
Prior to Cleversafe, I started a company called MusicNow which was a leading business-to-business provider of (legal) digital music services. We built and operated download stores, music subscription services and Internet radio services which were sold by companies including, Best Buy, Microsoft and Earthlink. In April, 2004 we sold MusicNow to Circuit City (who since sold it to AOL who then sold most of it to Napster). After the Circuit City acquisition, I took the summer and fall of 2004 off which was the first time in my adult life that I hadn’t worked. One of my projects was to organize all my stuff, so I digitized and organized all my financial records, pictures, correspondence, etc. which took several weeks.
I ended up with 30 GB of data which I needed to store for the rest of my life since I knew that I would never again want to spend so much time going through that organization process. At MusicNow, we had built a system to store all the music in the world, so I was quite familiar with the state of the art in digital storage. I needed a cost-effective system that could store my data for the next 50 years and knew that existing storage methods could not meet those requirements.
In 2004, I was also reading a lot about the history or cryptography. In particular, I was reading a lot about Operation Fortitude: how the Allies in WWII we able to have the Nazis initially believe that the landings at Normany were just a diversion. It was the most ingenious setup I’ve ever heard of – and it worked! (For those interested in a fascinating and very detailed read in this area, I highly recommend Fortitude: The D-Day Deception Campaign by Roger Hesketh.)
Inspired by the historical richness of cryptography, I moved on to directly explore cryptographic techniques and code breaking methods which lead me to read Code Breaking: A History and Explanation by Rudolph Kippenhahn. As I was reading Code Breaking, I took the time to “do the homework exercises” which meant that when the book covered how to break a monoalphabetic substitution cipher, I took the time to break a couple monoalphabetic substitutions as well as many other forms of ciphers. In doing these exercises, I learned a lot about how to use pieces of coded information to derive original data.
Reading about counter-intelligence and code breaking was not a part of a master plan to start a new business; I was just following my personal interests. But when I later started to think about how to build a system to store my personal data for 50 years, I had a good foundation in the mathematical techniques for coding and decoding information. With that foundation, I immediately had the strong intuition that you could create a system with the characteristics now found in Dispersed Storage. And so following that intuition, I wrote a very early prototype which was the first step in creating Cleversafe.
Chris


