What can you do with 400TB of mail?

The issue of how best to handle large email inboxes is a perennial topic here at Snorage, and it doesn't only affect enterprise customers.

The issue of how best to handle large email inboxes is a perennial topic here at Snorage, and it doesn't only affect enterprise customers.

While users of Web-based email services such as Gmail often like to boast about the massive amounts of data they're able to store, that doesn't actually eliminate the problem of how to manage the storage systems needed to support that — it simply shifts them into the "someone else's problem" category.

At Google's recent developer day in Sydney, Daniel Reyes, the head of engineering at MySpace Australia, outlined how MySpace was dealing with its own message storage problem.

While you probably know MySpace primarily for its endless store of "friends" for teenage relatives and/or as a neat means of listening to music online, it also supports what turns out to be a pretty considerable messaging infrastructure.

According to Reyes, 160 million messages are sent and 300 million received by MySpace users daily. At peak periods, that translates into 20,000 messages a second. That information is stored in 400 mail databases, each of which is around a terabyte in size.

Naturally, searching and accessing that volume of data can be a challenge. One of the solutions MySpace has adopted is Google's Gears technology, which allows localised storage of information accessed via an online environment — that is, getting MySpace users to store some messages on their own hard drives rather than in the cloud, with Gears ensuring that the two systems are blended together effectively while visiting the site.

This isn't a one-size-fits-all approach. While MySpace has around 380 million users, only 110 million are considered active, and of those, only a small percentage — those with a high level of messaging activities — have been prompted to install Gears and localise their mail store. MySpace originally targeted users with more than 5,000 messages stored, and more recently expanded its approach to cover anyone with more than 2,000.

Reyes said that the company is also considering localising the search system for friends on the site, which could prove a more challenging project given the MySpace culture of "more friends = better".

Getting users to take a hand in the management of data is often a useful step, although, as I've noted in other contexts, there's frequently resistance to this kind of change.

And sometimes there are people who are almost impossible to satisfy. "We had one user with over 100,000 messages," Reyes said. "One of the issues he had was that it was taking too long to replicate." Clearly the time taken to accumulate that amount of social detritus wasn't an issue.