So where's my smart search, dude?

Using style within an internet search engine's classification makes sense

I once wrote a semi-serious spoof on some Mac user attitudes under the title Are Mac users smarter than PC users?

The central shtick there was to compare the complexity and correctness of the sentences found in typical Mac and PC discussion forums using style.

Here's a key bit of the explanation:


By the early eighties most Unix releases, whether BSD or AT&T derived, came with the AT&T writers workbench - a collection of useful text processing utilities.

One of the those was a thing called style. Style is somewhat out of style these days but is on many Linux "bonus" CDs and downloadable from as part of the diction package.

Style produces readability metrics on text. Forget for the moment what the ratings mean and look at the numbers. For comparison here's what style says about the first 1,000 words in what is arguably the finest novel ever published in English: The Golden Bowl:

readability grades:
Kincaid: 18.2
ARI: 22.2
Coleman-Liau: 9.8
Flesch Index: 46.7
Fog Index: 21.7
Lix: 64.4 = higher than school year 11
SMOG-Grading: 13.5

Of course that's Henry James at the top of his form. For a more realistic, and interesting, baseline I collected about 2,800 lines of slashdot discussion contributions and ran style against them to get the following ratings summary along with a lot of detail data omitted here:

readability grades:
Kincaid: 7.7
ARI: 8.0
Coleman-Liau: 9.7
Flesch Index: 72.4
Fog Index: 10.7
Lix: 37.1 = school year 5
SMOG-Grading: 9.8



I then compared a few thousand entries from Mac discussion sites with stuff from PC forums to discover that the Mac discussions obtained significantly higher scores on measures of complexity and grammatical correctness - from which I cheerfully concluded that Mac users are smarter.

Now in reality I wouldn't argue that someone's ability to structure a sentence provides a sufficient guide to that person's relative intelligence, but the argument probably applies quite well to Internet documents - meaning that using style within an Internet search engine's classification makes sense, with better constructed, better expressed, materials always earning a higher page rank than poorly constructed, poorly expressed, materials.

Do a search, for example, on "Amish history" and google now places some drek written by a government bureaucrat at the top of the first listings page and buries the wikipedia entry near the middle of page two. Adopt a style style metric, however, and both entries end up where they belong - the wikipedia at the top and the bureaucrat buried.