Plagiarism detection style software identify authors of terrorist propaganda

Plagiarism detection style software identify authors of terrorist propaganda

Summary: Universities around the world have been using plagiarism software for a good few years now, to crack down on the amount of copied work. This, after all, is defrauding a university for the purpose of gaining a qualification, which can and has been classed as a criminal offence at least once before.

SHARE:

obl-comp-small.png Universities around the world have been using plagiarism software for a good few years now, to crack down on the amount of copied work. This, after all, is defrauding a university for the purpose of gaining a qualification, which can and has been classed as a criminal offence at least once before.

Some bright sparks at the University of Arizona has used essentially the intelligent properties of plagiarism software, to work out who writes terrorist propaganda on extremist websites, for them to then be called to justice. This revolutionary software has been coined, quite appropriately, "Dark Web".

Plagiarism software is "intelligent" in the way it can see things that the average human cannot, by using algorithms and logarithms, protocols and matching words and synonyms. Give someone two essays and ask them to spot the differences, and it's certainly not easy. However, a computer program can use sources from millions of pre-inputted books, academic work, the Internet and other submitted content, and analyse which parts are different, which are similar, and which has been blatantly copied.

plagiarism-detection.PNG

Every university in the United Kingdom have access to the Turnitin software, which is displayed through a web interface, allowing the submitterturnitin-smallest.pngto submitter to electronically hand in their work, and have it analysed for plagiarism before it is finally submitted. Many universities around the world have a similar scheme in place, often customised for their own purposes. This saves the lecturer and the marker of the work lots of time by having this process essentially automated for them, but it also acts as a deterrent for students who decide to try and copy off someone. You really can't fool this software, but it can give false negatives.

Some also have the capability to identify the style of writing - where nouns and verbs meet, added hyperbole and the overall discourse of the text, it can also analyse the style of writing, to distinguish who actually wrote the piece being submitted, based on the text in past submissions. From this, the UoA have:

"...developed various multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis, web metrics (technical sophistication) analysis, sentiment analysis, authorship analysis, and video analysis in [their] research."

The Associated Press covered a story relating to Dark Web and how it works; however due to the recent state-media running of the corporation, I can't quote anything they've written without imposing a fine. Here's the article, should you wish to read it, although it only seems to work properly in Windows Internet Explorer.

One of the key features of Dark Web, is the near-ability to learn as it goes. One of the sub-projects involved searching, reading, and studying how terrorists go about learning how to create certain improvised explosive devices, but by learning how the data spreads and which demographics read the articles, enables the software to improve the intelligence behind how they can be caught.

Hsinchun Chen, the director of the artificial intelligence lab at UoA spoke to The National Student about Dark Web, described how "analysts cannot effectively analyse writing styles in cyberspace, especially multilingual writings. But using our tool, we can get about 95% accuracy, because [the project is] utilising a lot of things your naked eye cannot see.”

Sounds like something James Bond could be interested in, when he's not blowing something up of course.

Edit: some formatting was out of place - just tweaked it slightly.

Topics: Browser, CXO, Software, IT Employment

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

7 comments
Log in or register to join the discussion
  • ...

    "[B]allowing the [U]submitterto [/U]electronically hand in their work,[/B]"

    Another edit for you... See the underlined. ]:)
    Linux User 147560
    • RE: Linux User

      Ahh, thanks for this. I use Firefox 3 with the backend system we have here, and for some reason, when you push the backspace button when deleting a word, the cursor says there's a space there but you carry on typing and the space disappears. It keeps happening - it drives me mad, but it's certainly not intentional.

      I blame the technology, not me ;-) I've updated it.

      [Edit] I've just realised I positioned the picture in a really odd place, between the words "submitted" and "to", which is why it concatenated. Very peculiar...
      zwhittaker
  • Don't the source databases violate copyrights?

    If they are comparing to millions of pre-input texts, they probably are violating the copyrights of the authors of those texts. Storing the text in electronic format is still copying it without permission. And if they are charging to compare the user's material to their database, then they are also profiting from their copyright violations.

    And what about re-used student papers? It is well-known that many fraternities and sororities maintain a file of term papers for various courses that their members can submit for various courses like Freshman English or World History. At least under American law an author has an automatic copyright, which means if the school is storing the papers or making them available to other universities the school is violating the copyrights of the author of the term paper.
    Rick_R
    • RE: Don't the source databases violate copyrights?

      I know this applies for the UK and the US. Every academic article submitted as a university student or academic, the author remains the author, yet the copyright automatically falls to that University, allowing them to do anything they want with it.

      In the UK, every academic article gets automatically submitted to the British Library, through some act passed in Parliament some umpteen years ago. This means that it's freely available to the public, therefore anyone can access it.

      In response to your question about it violating copyright - who gives a damn if it's helping to find terrorists and protect our respective national securities? :-)
      zwhittaker
      • RE: Don't the source databases violate copyrights?

        "I know this applies for the UK and the US. Every academic article submitted as a university student or academic, the author remains the author, yet the copyright automatically falls to that University, allowing them to do anything they want with it."

        Nope, you know wrong. For anything produced "for hire" by a student or faculty member, such as a brochure, TV/radio/print ad, training courses which are requested by an administration official and are paid for specifically, etc., the U owns the copyright.

        However, academic articles (journal articles, meeting presentations and/or abstracts), books (scholarly or otherwise, when not commissioned by the U), course materials (print or electronic), etc., are all copyright the author - faculty, student, whatever.
        cd2_z
        • I should have said: US , I don't know about UK.

          nt
          cd2_z
          • RE: cd2_z

            You're right - I apologise - I must have got carried away with myself :-) I was talking about the UK, not the US - I must have just thrown it in without thinking.
            zwhittaker