Attack of the BLOBs: Data behaving badly

In the sixth part of Robert Schifreen's SharePoint 2010 adventure, he encounters the product's unique way with databases and how promise and reality differ
Written by Robert Schifreen, Contributor on

In the sixth part of Robert Schifreen's SharePoint 2010 adventure, he encounters the product's unique way with databases and how promise and reality differ.

The most important part of any SharePoint farm is the collection of content databases. SharePoint stores all user data (for example, sites, documents, libraries, lists, calendars, blogs, wikis etc) in SQL databases.

The only database engine that's supported is Microsoft SQL Server. Like any database product, MS SQL Server is great for storing lots of small pieces of data. Larger data items such as documents, spreadsheets and pictures are best stored in flat files, as any DBA will tell you. But SharePoint was really designed to put all content in databases, and planning for these databases can be tricky.

You can have as many SQL databases as you like, but Microsoft recommends that you don't allow any one database to grow larger than 200GB, after which performance apparently starts to degrade. It's also best not to have individual drive volumes that run into tens of terabytes, because they're a pain to back up.

Microsoft has a solution to the problem of storing large data objects (BLOBs) in SQL databases, which is something called RBS or Remote Blob Storage. Each blob gets stored as a standard file on the file system and its entry in the database is replaced with a pointer to the external file.

From the start, we'd always assumed that using RBS was a no-brainer. It made absolute sense. It could, we were told, reduce the size of content databases by up to 95 percent.

But the more we found out about RBS, the more it fell out of favour. For reasons that make no sense at all, Microsoft explicitly states that the database storage requirements for a SharePoint farm are the same regardless of whether or not you use RBS. So the 95-percent saving, or actually any saving at all, can't be guaranteed. Your databases might still grow to hundreds of gigabytes, even without the BLOB data in there. Or at least, Microsoft can't guarantee that they won't.

The de facto standard RBS middleware for SharePoint is something called FileStream, which ships as a free add-in for MS SQL Server. It's a fiddly job to install and configure, and requires the typing of multi-line impenetrable things at the command line. Rumour has it that FileStream was only written as a proof of concept by Microsoft and isn't up to production use. Apparently, RBS is really only useful if your SAN hardware vendor provides a native RBS driver for its kit. Which ours does not.

Perhaps the biggest drawback of RBS is that you have to be 100 percent sure that there's no way the collection of external files can ever become out of sync with the pointers in the databases. If that occurs, dire things happen to your entire SharePoint farm. So backups have to be done when the entire farm is down in order that neither the pointers nor the files can change.

My philosophy throughout the project has always been to keep things as simple, efficient and manageable as possible. We therefore decided that we would not deploy RBS for the initial rollout, but would revisit it at a later date. A subsequent chat with some consultants confirmed that we had chosen a sensible path. Until we have a SAN that supports RBS natively in its firmware, it's unlikely that we'll go down that route.

That's not flexible, that's complicated

With the RBS decision taken, we moved on to architecting the SharePoint farms. There would be three: a production farm, an identically laid out farm for testing and development, and a smaller farm outside of our main Windows domain for testing potentially disruptive domain-level things such as updating user account information.

I was to build and look after the first two farms. A colleague in the IT infrastructure department who maintains our Active Directory would build and maintain the third one, as it would mostly be used by him rather than me.

SharePoint farm architectures are difficult. Microsoft probably calls it 'flexible', but 'complex' sums it up better.

SharePoint farm architectures are difficult. Microsoft probably calls it 'flexible', but 'complex' sums it up better. At the simplest level you can have a single machine running SharePoint and SQL Server. This is fine for development, demos and tests, but isn't really sensible for serious use. There's no resilience, and the machine will rapidly become overloaded.

It doesn't help that the SharePoint installation wizard does a very poor job of helping you decide between a one-server farm (ie, SharePoint and SQL Server on a single box) and a development server (which uses SQL Server Express and can't be expanded beyond one box). If you end up ticking the wrong box, all you can do is reformat and try again.

The next step up from a single machine is to have two tiers. One or more SharePoint servers, and one or more SQL servers. Based on conversations with consultants and some knowledgeable conference speakers, we decided relatively quickly that a single database server would do.

Managing multiple SQL servers would be problematic, and throwing a reasonable amount of resources at a single server (32GB of RAM and eight processors) would, I was told, suffice. And because all of our servers were to be virtual, upping the spec would be trivial.

Not, of course, that everything is as easy with a virtualised environment as the purveyors of virtualised environments would have you believe...

Next: To virtualise or not to virtualise?

Robert Schifreen has reported on and implemented online technology since the early 1980s. His latest project has been a large SharePoint 2010 installation in tertiary education. We will be serialising his experiences, positive and negative, in getting it to the stage where it's ready for action; the entire series will also be available as a downloadable white paper.

Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.
Editorial standards