Google's PageRank may help Microsoft weed out Skype spammers

Summary:Microsoft may have a way to detect most 'stealth' Skype fraudsters, but those that evade detection for 30 months remain invisible.

Machine learning and a little reverse PageRank could be Microsoft's new weapon to keep fraudsters, spammers and malware peddlers off Microsoft's VoIP service, Skype.

Microsoft already employs anti-fraud systems to keep scammers away from its roughly 300 million users on Skype, and, according to data scientists from Microsoft Research, the VoIP company detects the bulk of fraudulent accounts within one day after their creation.

Still, an unknown quantity of "stealthy" fraudsters manage to bypass detection and, from there, may remain on Skype for years, potentially propogating credit card or online payment fraud, and instant message spam.

A new anti-fraud system described in the Microsoft Research paper Early Security Classification of Skype Users via Machine Learning (PDF) could cut the time needed to find the 'stealthy users' down to four months. The report's authors claim the methods used were able to detect 68 percent of stealthy accounts with a five percent false positive rate.

The anti-fraud system borrows techniques from a previously developed method for predicting failure in datacentre disks, but in the case of Skype relies on a snapshot of a large number of varied data sets based on real Skype account activity to produce the odds of fraudulent versus normal activity.

Included in sample of 100,000 Skype accounts were hashed Skype IDs, profile data such as age and gender, the number of days an ID used a Skype feature such as video calls, the number of friend requests and how many times a user was deleted.

Interestingly, the researchers used a reversal of Google's Search PageRank algorithm as one of the pre-processing methods to help produce the odds of an account being fraudulent. They note this method has also been used previously to detect spammers.

"A spammer that sends many spam emails but receives very few emails will have a high number of incoming edges in the reserve email graph, hence will likely have a high PageRank score," the report says.

"In our work, we adopt a similar approach and compute PageRank scores on the reversed Skype user contact graph…. Thus, users with higher PageRank scores are likely to be those that sent out a large number of friend requests."

While the anti-fraud system seemed to work well for accounts that have remained undetected for less than 10 months, it also "missed most" of fraudulent accounts that stay active for over 30 months.

More on machine learning

Topics: Security, Microsoft

About

Liam Tung is an Australian business technology journalist living a few too many Swedish miles north of Stockholm for his liking. He gained a bachelors degree in economics and arts (cultural studies) at Sydney's Macquarie University, but hacked (without Norse or malicious code for that matter) his way into a career as an enterprise tech, s... Full Bio

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.