Australia's securities and investment watchdogs are turning to document-classification technology employing the latest linguistic techniques in their hunt for Web-based fraudsters.
The Australian Securities and Investment Commission (ASIC) on Monday unveiled a joint research project with the Capital Markets Cooperative Research Centre, the University of Sydney and Macquarie University to develop an automatic Internet document classification system called 'Scamseek'. The system is expected to use linguistic techniques developed by Macquarie University researchers to uncover Web sites promoting scams, even those which attempt to use forms of language disguising their true intentions.
The AU$1m-plus project would, if successful, build an "eye that never sleeps, constantly seeking out sites that we can take action against," ASIC director of electronic enforcement, Keith Inman, said in a statement.
He said Scamseek would have the potential to determine potential risk by scanning entities against public and private databases; assess and aggregate the risk associated with information on a Web site; identify people and companies mentioned on a Web site; and mark sites that are above the acceptable risk threshold for further analysis.
Inman told ZDNet Australia that it would take at least six months to complete the research required to assess the viability of the project. "We are led very much by our research partners," he said. However, he said Professor Jon Patrick, the team leader for the CMCRC and the University of Sydney, was "reasonably confident" based on concept work undertaken to date.
Inman acknowledged the problem of illegal offerings had risen sharply over recent years. "Clearly the [advent of the] Internet as a medium to disseminate and make illegal offerings has empowered large numbers of people to do so."
This rise had imposed an increasing burden on ASIC's enforcement resources, Inman said, citing the outcomes of ASIC "Surfdays" whereby staff logged on to the Internet for several hours over a couple of days in a bid to track down fraudsters and scammers. He said that for every one suspect site located, staff had to trawl through 1,000 legitimate sites. Consequently ASIC was looking to automate as much of the process as it could.
"This is about being proactive," he said, with ASIC planning to have sites located and their authors dealt with before they shifted from fledgling to fully fledged operations.
Patrick said the system would operate using the most up-to-date research in document classification, as well as analytical methods for identifying the meaning of words.
"Scams that are run through Web sites tend to use certain words, in certain ways, with certain characteristics --- but they can be cleverly disguised as well," Patrick said. He said he and colleagues specialising in linguistics were using new theories designed to expose even those sites that heavily disguised their intentions.
The project will also apply a specialist 'Web spider' to search out potential Web sites, using technology developed by one of the CMCRC industry members, SMARTS (Security Markets Automated Research Training and Surveillance).