Machine learning and a little reverse PageRank could be Microsoft's new weapon to keep fraudsters, spammers and off Microsoft's VoIP service, Skype.
Microsoft already employs anti-fraud systems to keep scammers away from its roughly 300 million users on Skype, and, according to data scientists from Microsoft Research, the VoIP company detects the bulk of fraudulent accounts within one day after their creation.
Still, an unknown quantity of "stealthy" fraudsters manage to bypass detection and, from there, may remain on Skype for years, potentially propogating credit card or online payment fraud, and instant message spam.
A new anti-fraud system described in the Microsoft Research paper Early Security Classification of Skype Users via Machine Learning (PDF) could cut the time needed to find the 'stealthy users' down to four months. The report's authors claim the methods used were able to detect 68 percent of stealthy accounts with a five percent false positive rate.
The anti-fraud system borrows techniques from a previously developed method for predicting failure in datacentre disks, but in the case of Skype relies on a snapshot of a large number of varied data sets based on real Skype account activity to produce the odds of fraudulent versus normal activity.
Included in sample of 100,000 Skype accounts were hashed Skype IDs, profile data such as age and gender, the number of days an ID used a Skype feature such as video calls, the number of friend requests and how many times a user was deleted.
Interestingly, the researchers used a reversal of Google's Search PageRank algorithm as one of the pre-processing methods to help produce the odds of an account being fraudulent. They note this method has also been used previously to detect spammers.
"A spammer that sends many spam emails but receives very few emails will have a high number of incoming edges in the reserve email graph, hence will likely have a high PageRank score," the report says.
"In our work, we adopt a similar approach and compute PageRank scores on the reversed Skype user contact graph…. Thus, users with higher PageRank scores are likely to be those that sent out a large number of friend requests."
While the anti-fraud system seemed to work well for accounts that have remained undetected for less than 10 months, it also "missed most" of fraudulent accounts that stay active for over 30 months.