Internet concepts build new ways of storage

Analysis As storage requirements increase and get more complex, there's still much discussion about the right way to implement and manage large, heterogeneous and physically diversified data stores

Researchers at the University of Tennessee and the University of California Santa Barbara are taking their cue from another complex, heterogeneous distributed system -- the Internet -- to create a conceptually simple but very capable way of hooking users and storage together. The Internet Backbone Protocol (IBP) they're developing contrasts with the current most popular storage management idea -- the computer centre model. This assumes storage is expensive and scarce, and in need of strict management, authentication and access controls. The same assumptions inform the trend towards information grids, where different bits of storage coalesce into unified platforms that 'virtual organisations' can use. The Internet way of doing things, on the other hand, makes everything on the network available to everyone, with the links being as lightweight and simple to implement as possible concomitant with scalability and good-enough reliability. It's also much easier, with the Internet way of doing things, to add private resources to the general good. IBP is an attempt to marry these ideas to storage: while it remains experimental, it examines many of the problems of modern storage systems and offers some interesting new views on how to solve them. According to the researchers, IBP is "a mechanism developed for the purpose of sharing storage resources across networks ranging from rack-mounted clusters in a single machine room to global networks." To that end, much of its design mirrors that of IP -- in particular, IP datagrams. IP datagrams at one level just deliver packets of data from one place to another, but at another can be scaled to work at any size and complexity of network. In particular, datagrams don't care about the details of any particular connection. Packets can be linked together, hiding the details of the sizes of individual packets; there's only one fault mode -- a packet is dropped -- regardless of what caused the problem; and a global addressing scheme means nobody cares about local network configurations, or indeed if the local network is reconfigured at any time. And as anyone can send a packet anywhere on the network without worrying about who owns the intermediate stages, the whole thing scales to a truly global system. IBP takes these ideas and applies them to disk storage. Instead of datagrams, it considers the basic unit as blocks of data (on disks, tape or other media), managed as arrays of bytes. It can aggregate these into larger units, thus hiding whatever size limits there are on any particular storage device; it copes with any error by just discarding the faulty byte arrays; and it has a global address structure, analogous to IP addresses, that is only translated locally into different addressing schemes. These ideas let a uniform IBP model apply to storage resources across a global network, and thus allow what the researchers say is the most important difference between it and other schemes -- any IBP network member can access any storage resource in the network no matter who owns it. This creates a global storage service. As with IP networks, there are downsides. First, denial of service (DoS) attacks -- both deliberate and accidental -- are bad with IP services but worse with IBP: an IP DoS goes away once the attack is stopped. With IBP, an aggressive over-allocation of storage blocks leaves those blocks allocated after the attack is over. Moreover, there are real advantages to an attacker in taking over huge amounts of storage, while IP DoS just causes confusion and misery. The other big downside of IBP is that as with IP reliability isn't high on the lists of attributes. Yet with storage systems, reliability has always been of the utmost importance. IBP addresses these problems. The DoS problem is approached by limiting the amount of time storage belongs to a network member before it is released to global use again. Also, a request for storage can be refused in cases of over-allocation, as routers drop packets when they run out of resources: deciding when, where and how to do this gives a degree of tuning for the system, again much as setting limits on router behaviour fine-tunes an IP network. The reliability issue is coped with by higher levels of protocol that build on top of IBP, much as TCP builds on IP to make a reliable link out of a basically unreliable network. IBP is not reliable, but this does not limit the reliability of the overall storage network. Next week, we'll look at the details of how IBP copes with security, allocation policies, addressing and other practical aspects of a global data storage network.

Have your say instantly, and see what others have said. Go to the Tech Update forum. Let the editors know what you think in the Mailroom.