Data Storage
Apr 22, 2008
Solid State Drives and Dispersed Storage
As a comment to my post last month about hard drive speeds, Waldo made a comment about the potential impact of Solid State Drives (SSDs) to change the mainstream data storage paradigm. I agree that SSDs have significant potential and will become increasingly important. As computers become more ubiquitous: more transportable, more hold-able, more wearable, etc., the benefits of solid state storage – lighter, smaller, more durable – will become more relevant. In addition, SSDs are much faster that hard drives.
However, at the same time people will are using more and more data which will further expose the weakness of SSD: their cost per unit storage is significantly higher that hard drives.
So how can you have the best of both worlds? Imagine a mobile device, like a mobile phone or a laptop, with a SSD (or just a lot of solid state storage). Now imagine that mobile device also has a high speed wireless data connection to a Dispersed Storage network. In this approach, the SSD is a fast, huge storage cache while the dsNet provides massive, reliable secondary storage. With this approach, you get fast access in a small, light and durable device while having access to unlimited storage on a dsNet. And if you lost your mobile device, you could connect a new mobile device (or two or five) to you data on the dsNet and never lose a bit.
The pieces to put this together will all be available this year. Anyone want to port the dsNet client to their phone?
Chris
Mar 03, 2008
Cleversafe Open Source vs. Commercial
Now that Cleversafe has announced its upcoming commercial products, we can talk more clearly about the relationship between the Dispersed Storage open source project (here at www.cleversafe.org) and Cleversafe’s commercial products.
As we are doing with our technical approach, we are utilizing the outstanding and proven approach taken by the Internet for our business ecosystem model.
The key protocols that power the Internet: TCP, IP, UDP, LDAP, DNS, etc are genuinely open protocols in that they are in the public domain and available as open source. As a result, the Internet really is an interoperable internetwork. Way back in the 80’s, I was in the IT department of a large aerospace company responsible for setting networking standards and I remember when other networking protocols – such as IBM’s SNA, DECNET and Novell’s SPX/IPX – were more popular than TCP/IP. Each of these proprietary protocols were driven by well-funded and capable R&D organizations; whereas, the R&D behind TCP/IP was not fueled by a large, well-funded technology company. However 20 years later, TCP/IP has pretty much completely taken over as the networking and internetworking protocol of choice and the Internet absolutely dominates any networks still using these previously popular proprietary protocols (P4’s).
Yes, TCP/IP is a great protocol, but the P4’s protocols were pretty great, too. I believe part of the reason that TCP/IP and the Internet emerged as the world’s network is based on human nature. Simply put, most people didn’t want the world’s network to be owned or controlled by one company, so as the world built its internetwork, a genuinely open protocol had an inherent advantage and was ultimately propelled to become the protocol behind the Internet, the world’s network.
When we began to develop Dispersed Storage, we realized the technology had significant potential, namely to store the world’s data. We also realized, based on the lesson of the Internet, that the protocols of a Storage Internet ultimately must also be open source, so we created the Dispersed Storage project in order to develop and publish the Dispersed Storage protocol as open source. In order to do this, we also created a lot of new open source code and incorporated a lot of open source code from other projects.
Even though the Dispersed Storage protocol stack and core features are available as open source, plenty of great opportunities exist for creating commercial products and services. To use the Internet analogy again, network equipment companies, like Cisco and Juniper create commercial products (switches, routers, gateways, etc.) by taking the open TCP/IP protocol, adding proprietary features like management systems, integrating onto optimized hardware with an optimized OS and then selling those products through a trained channel that provides services like support and installation. Cleversafe is using that same approach for its commercial products, so once again we are standing on the shoulders of the giants who build the technology and the business models for the Internet.
Chris
Dec 14, 2007
Are Hard Drives Getting Slower?
In preparing a presentation recently on long term trends in data storage, I was talking with Russ Kennedy and he mentioned that it was a known fact within the data storage industry that hard drive performance has been lagging. So mentioned this to Dennis Roberson and he connected me with this article at Tom’s Hardware which I found very enlightening.
It turns out that increases in hard drive capacities have been keeping pace with Moore’s law (as observed) by doubling every 24 months. But hard drive performance (reads and writes) hasn’t been keeping pace. Even though hard drive performance has been increasing, it hasn’t been increasing as fast as hard drive capacities have been increasing.
So if the amount of data that people want to read from and write to hard drives has been increasing with the rate of increase of CPU performance and hard drive capacities, then the realized performance of hard drives (on this “technology adjusted” basis) has been getting SLOWER! This is a big part of why Windows never seems to load any faster.
The article at Tom’s Hardware really brought these diverging trends together in a compelling way by measuring the speed to read a single platter against the year when the hard drive was manufactured. This chart shows that the speed to read a single platter of data has decreased by almost 10X over the past 15 years. Wow.
If these trends continue, hard drive speeds will become an increasingly limiting factor which will further limit the approach of a local hard drive as the primary data storage system. Especially for high performance environments, the solution to lagging drive performance will be architectures like Dispersed Storage that write to or read from multiple drives in parallel.
Chris
Oct 19, 2007
What would you do with a 500,000,000,000 Gigabyte hard drive?
Hitachi recently announced that they will be shipping a 4 terabyte hard drive in 2011. One terabyte hard drives are now available for $320, so it you can expect 4 terabyte drives will be in this same price range within a short number of years. (4 terabytes = 4,000 gigabytes = 4,000,000 megabytes.)
Recently one of my friends sent me this link that tracks hard drive pricing over time. In 1956, IBM was selling 5 megabytes for $50,000 which equates to $10,000 per megabyte. So in 1956, four terabytes of hard drive storage would have cost $40 billion dollars. Yes, $40 billion dollars for the storage that will be on a single hard drive in 2011. ($10,000 x 1,000 x 1,000 x 4)
So if this rate of decreasing cost per unit of hard drive storage continues at the same rate through 2066, would be able to buy a 500,000,000 terabyte (500 exabyte) hard drive for around $320 if fifty years or so. Five hundred million terabytes is a huge number when you think about it in 2007 – just like 4,000,000 megabytes (4 terabytes) seemed like a huge amount of storage in 1956.
To put a five hundred million terabyte hard drive in perspective, consider how much storage is available on all hard drives currently in use today. According to Disk/Trend, hard drive factories produced between 450 million and 460 million hard drives in 2006. If you assume the average size of a hard drive manufactured in 2006 was 100 GB, that hard drive factories will produce about 40% more capacity in 2007 and produced about 40% less capacity in 2006 (following the rate of change of Moore’s Law) and that hard drives remain in use 3 years, then the total capacity of all hard drives in use on the planet at the end of 2007 will be about 140,000,000 terabytes (140 exabytes).
So this means that if prior trends continue, a typical hard drive in 2066 would have a capacity equal to about 3 times all the storage capacity of all the hard drives in the world today. I may be off by an order of magnitude or more and/or a decade or more, but the point is that that previous trends suggest that your grandchildren’s hard drive will be as big as all the hard drives currently on the planet. What would you do with all this data storage? Or maybe a better question is what will our grandchildren do with all this storage?
Chris

