Book review: Mining the Social Web

Summary:What can someone find out about your tweets, and the people who retweet and respond to them? Rather a lot: despite being limited to 140 characters, tweets include metadata such as what kind of character encoding they use, and a great deal more.

What can someone find out about your tweets, and the people who retweet and respond to them? Rather a lot: despite being limited to 140 characters, tweets include metadata such as what kind of character encoding they use, and a great deal more.

You can see how friendly people are on Twitter and how they're interconnected. You can see how influential someone is and who they influence, who influences them, who they think is talking about something interesting enough to repeat (without that being distorted by how often that interesting person tweets), whether they only follow and reply to people who are similar to them, whether they're in a large or small clique of connected users, whether the people they talk to the most are their closest friends — and whether any of that is reciprocated. And you can see it for far more people than you could analyse just by scrolling through their Twitter feed.

This is the kind of information that tools like Klout use to rank and score Twitter users, but the raw data is accessible and you can mine up to the last 3,000 tweets for public users to analyse so you can ask your own questions. You could define a profile and look for Twitter users who match it, or find out what time of day someone is most likely to tweet or retweet. And you can get step-by-step instructions for doing nearly all of that, along with visualisations of the connections and relationships, using the instructions in Matthew A. Russell's Mining the Social Web.

Russell also covers what you can deduce from LinkedIn profiles. This is both more limited, because although you can get a lot of information about people in your LinkedIn network, you can't even find out whether two arbitrary people have any connection (a deliberate decision by LinkedIn to protect users' privacy), and much richer because of the amount of detail users put on their profile and how structured that information is. As well as seeing who you know who's ever had a job at a company you want to apply to (the kind of thing LinkedIn is already useful for), you could look where people with specific job titles are based, to see if it's worth building a business or running a workshop in a particular area.

The chapter on Facebook points out that you can get a developer token in ten minutes, and suggests interesting mining ideas like analysing how many people have moved away from where they grew up or were educated. Russell doesn't neglect the social network that has the most interesting information — your inbox.

The book starts with a quote from Tim Berners-Lee about the original social nature of the web: "We clump into families, associations and companies. We develop trust across the miles and distrust around the corner". The tools and techniques here let you see just how true that is. None of them would help you pry into information that wasn't public in the first place, but it's a great demonstration of quite how much of the information we put into online services is public and available for mining in these ways.

If you want to use the programming examples in this dense but pacy guide, you'll need to be comfortable with Python and with diving into code. But what you get are usable tools rather than toy examples. There are some interesting statistics (only half of tweets contain either a hashtag or another twitterer's name), a nice overview of set theory and fascinating diversions on what you can and can't do with natural language processing (if you want to mine blogs as well) and the ideas of the semantic web. The only problem is that the book tackles such a wide range of services and information sources that it's only a taster for what you could do by mining each of them — especially Facebook.

If you just want a guide to how much information can be extracted from social networks with a moderate amount of programming skills, you can skip through the coding details and concentrate on the explanations and suggestions for further analysis and visualisation. You'll emerge with a realistic view of what the social web can actually tell us about connections, interactions and the deductions you can make from them.

Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites Matthew A. Russell O'Reilly Media 360 pages ISBN: 978-1-4493-8834-8 £30.99

Mary Branscombe

Topics: Reviews

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.