Analyzing blog data

Summary:I'm at the 15th Annual Conference on the World Wide Web, known as WWW2006, this week. Today I popped into the associated workshop on weblog ecosystems.

www2006.jpg
I'm at the 15th Annual Conference on the World Wide Web, known as WWW2006, this week. Today I popped into the associated workshop on weblog ecosystems.

The workshop discusses recent research around blogs. You might wonder what people research; the topics cover social aspects, inferring community, recognizing spam blogs, and so on. This all goes under the term "blog data mining and analysis."

I enjoyed Belle Tseng's presentation on this paper that talks about communities in blogs. Blogger actions, like posting, commenting, and so on create a community of bloggers (mutual actions are important). The technique allows communities to be inferred even when there is no common topic among the bloggers.

Tim Finin presented research about splogs. Some of his data shows that measurable features, like the number of incoming links, can be used to tell blogs from splogs. Real blog incoming links follow a powerlaw, but splog links don't.

Good data sets are a problem for research in this area. Using different data sets can lead to different results. Data quality is also, obviously, a concern--splogs can contaminate data.

Krisztian Balog presented some interesting information about analyzing mood data using tagged postings from LiveJournal. The system is called MoodViews. The data shows interesting correlations and cyclic trends. For example, it probably won't surprise you to know that stress increases before and then takes a big dip after Christmas.  Given that LiveJournal users also have profiles, there's a potential for correlating mode with location, profession, and so on. This seems like it would be a direct marketer's dream.

Ko Fujimura presented research on blog specific searching. The goals of search in blogs is different than the Web in general. When people search blogs, they have three goals: topic search, blogger search, or reputation (reviews) search. The experimental search engine, called BlogRanger (warning: it's in Japanese), let's users perform searches in each of these categories and suggests refining keywords for topics, blogger names for blogger search, and adjectives for reviews.

 

I wonder what impact this research has on people build businesses on the Web.  I suspect most of this information is largely unknown to or is rediscovered by people time and again.   

Topics: Big Data

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.