Personal tools
You are here: Home Dispersed Storage Information Dispersal Algorithms

Information Dispersal Algorithms

Information Dispersal Algorithms (IDAs) separate data into unrecognizable Slices, which are distributed, via secure Internet connections, to storage locations at home or throughout the world. This collection of dispersed Slices creates a Dispersed Storage Network (dsNet). With dispersed storage, transmission and storage of data is inherently private and secure. No single entire copy of the data resides in one location, and only a subset of the nodes needs to be available in order to fully retrieve all of the data on the dsNet.

Data on the dsNet remains private and secure in the face of natural catastrophes or failures due to hardware, connection, facility, or IT management. Moreover, the individual Slices do not contain enough information that would allow an unauthorized viewer to determine the original content.

Comparing IDAs with Copy-Based Systems

An IDA is a function which allows for forward error correction and recovery.  By coding and dispersing information, the reliability, security and efficiency of data storage can be vastly improved over traditional copy and parity-based systems.  Copy-based systems offer very high reliability; when data is mirrored on n storage devices, up to n-1 of the devices can fail without a data loss.  However, such a system is extremely expensive; in fact, n times more expensive than storing a single copy.  Parity-based systems, commonly used in RAID, typically allow at most two storage devices to fail without data loss. So, while parity-based systems are not as wasteful as copying data, they are also not as reliable.

RAID systems can be configured to allow for the simultaneous failure of two drives and, with copy and replication, can allow for significantly more drive failures before there is data loss. The downside of such systems is their high cost since they are both copy and parity-based systems.

Another benefit over copy-based systems is the inherent security advantages of dispersing information.  IDAs never store all the data at any single location.  Instead, Slices are stored, each of which is but a small fraction of the data, and each is unique. 

A backup or archive system based on IDAs can be configured to disperse data to any p number of devices which can sustain up to m simultaneous failures without data loss. Even if an attacker gained access to multiple devices, access to the data will not occur until at least p-m of the devices are compromised. Compare this to a copy-based system, where compromising any single device  would yield the entire data.  Maintaining multiple copies also creates multiple vectors of attack, further decreasing the chances that the data is secure.

One of the biggest benefits to IDAs is their efficiency: the storage overhead is equal to p/(p-m).  (There is also storage overhead associated with the block file device of ~12%).

Therefore, in the above example, backing up 1 GB of data would require just 1.45 GB (1.33+.12 storage overhead) of total storage across all 16 devices.  To similarly support 4 simultaneous failures under a copy-based system requires making five copies, which would use 5 GB of storage. 

For a parity-based system to approach the efficiency of the IDA in the above example would require four storage devices, including one used exclusively for parity data.  While such a parity system has the same storage requirements, it can only support one failure without a data loss. Compare that with the four failures that are supported in the system based on IDAs. 

IDAs Offer Reliability and Efficiency

The IDA can be configured with m taking any value from 0 to p-1.  For example, you may have an information dispersal system in which data is backed up to 16 separate devices of which any 4 can fail, in this example p = 16, and m = 4.  

IDAs offer the best of both worlds: the high reliability of copy-based storage and the high efficiency of parity-based systems.

Dispersed storage systems also allow for “self-healing”.  If data on a device becomes corrupted or destroyed, an automated process can detect this and, using the IDA, recalculate all data contained in the missing slice by inspecting available Slices.  Through self-healing, the mean time before failure on a dsNet can easily be many millions of years.

Creating Systems Using IDAs

IDAs can be used to create highly reliable, very secure, and much less expensive backup and archive systems.  In addition to reliability and efficiency benefits, IDAs make perfect sense for geographically distributed storage.  If each of the p devices is kept at a different location throughout the world, then you have a “disaster proof” backup system.  These multiple locations could suffer power failures, floods, earthquakes, fires, or other catastrophes simultaneously, but as long as p-m locations remain, all of the data would be safe and accessible.

As far as replicating data to send for storage, if the data were sent to 40 locations around the world, each location would need its own copy. In addition, transferring this amount of information would be impractical.  However, with IDAs a data dsNet containing 40 nodes could be configured, of which only 30 nodes would be needed to recover one’s data.  In such a configuration p=40, and m=10.  The storage and bandwidth requirements are multiplied by 40/(40-10) = 1.33.  An additional 33% overhead could easily be accommodated.

Previous       Next