Kaspersky: Duqu Trojan uses 'unknown programming language'

Kaspersky: Duqu Trojan uses 'unknown programming language'

Summary: Security firm Kaspersky says the Duqu Trojan was partly written in a programming language it does not recognize. The company is asking the programming community for help to identify the code.

SHARE:

Update: Duqu Trojan programming language identified

The Duqu Trojan, referred to by some as Stuxnet 2.0, was partly written using an unknown programming language. The payload DLL, which communicates exclusively with the Command and Control (C&C) server so that the worm knows what to do once it has infected a system, has code that doesn't resemble anything seen before. While security researchers have worked out what the mystery code does, they aren't sure about the syntax.

Some parts of it, including those for downloading and executing additional modules, were written in standard C++, but a big chunk of it was not. This particular section contains no references to any standard or user-written C++ functions, and may have been created by a different programming team. Security firm Kaspersky says the unusual code is unique to Duqu: many parts are directly borrowed from Stuxnet, but this one is new. The company has named it the Duqu Framework, and has noted that it is not written in C++, Objective C, Java, Python, Ada, Lua, and many other languages it checked. Unlike the rest of Duqu, it also wasn't compiled with Microsoft's Visual C++ 2008. All we know is that it's object-oriented.

The sophistication of the worm is one thing, but the fact that an entirely new programming language may have been created for it, points to some seriously deep pockets backing the project. Security experts have suggested that a state must have been involved in its development, and Kaspersky CEO Eugene Kaspersky supports this speculation on Twitter:

The mystery of #Duqu framework http://bit.ly/w5BrzP <- seems the state behind #Duqu sponsored the development of a new progr language

Here is what Kaspersky was able to conclude in its analysis:

  • The Duqu Framework appears to have been written in an unknown programming language.
  • Unlike the rest of the Duqu body, it's not C++ and it's not compiled with Microsoft's Visual C++ 2008.
  • The highly event driven architecture points to code which was designed to be used in pretty much any kind of conditions, including asynchronous commutations.
  • Given the size of the Duqu project, it is possible that another team was responsible for the framework than the team which created the drivers and wrote the system infection and exploits.
  • The mysterious programming language is definitively NOT C++, Objective C, Java, Python, Ada, Lua and many other languages we have checked.
  • Compared to Stuxnet (entirely written in MSVC++), this is one of the defining particularities of the Duqu framework.

"After having performed countless hours of analysis, we are 100% confident that the Duqu Framework was not programmed with Visual C++," writes Kaspersky Lab Expert Igor Soumenkov. "It is possible that its authors used an in-house framework to generate intermediary C code, or they used another completely different programming language. We would like to make an appeal to the programming community and ask anyone who recognizes the framework, toolkit or the programming language that can generate similar code constructions, to contact us or drop us a comment in this blogpost. We are confident that with your help we can solve this deep mystery in the Duqu story."

The blog post from Kaspersky Lab is already filled with comments guessing at the possible programming language used. Check them out for yourself: The Mystery of the Duqu Framework. One comment from user As400tech, who registered his account with Kaspersky Lab today, looks particularly promising:

That code looks familiar The code your referring to .. the unknown c++ looks like the older IBM compilers found in OS400 SYS38 and the oldest sys36.

The C++ code was used to write the tcp/ip stack for the operating system and all of the communications. The protocols used were the following x.21(async) all modes, Sync SDLC, x.25 Vbiss5 10 15 and 25. CICS. RSR232. This was a very small and powerful communications framework. The IBM system 36 had only 300MB hard drive and one megabyte of memory,the operating system came on diskettes.

This would be very useful in this virus. It can track and monitor all types of communications. It can connect to everything and anything.

Duqu was first detected in September 2011, but Kaspersky Lab believes it has seen the first pieces of Duqu-related malware dating back to August 2007. The Russian security firm also notes Duqu, like Stuxnet before it, is highly targeted and related to Iran's nuclear program.

Update: Duqu Trojan programming language identified

See also:

Topics: Software Development, Security

Emil Protalinski

About Emil Protalinski

Emil is a freelance journalist writing for CNET and ZDNet. Over the years,
he has covered the tech industry for multiple publications, including Ars
Technica, Neowin, and TechSpot.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

42 comments
Log in or register to join the discussion
  • Duqu Programming Language

    When I was a lad! I have always wanted to say that.
    It's Assembly Language, I'd recognize it anywhere. Looks like it is using an inline assembler, like the old Borland C, Delphi or similar.
    Very low language, just one step away from machine language. Working directly with the registers,
    'push' instruction 'esi' (Source Index Register) contents onto stack (A LIFO type Array)
    'call' to other functions or procs
    'pop', pulls it off
    'mov'(e) from one to another, to the left
    'lea' load effective address

    Given a little more info could probably work out exactly what it is up to, but you probably already know.

    Found my old MASM (Microsoft ASseMbler) Assembly Language guide. Thought: Maybe this was Assembled in Linux, the CODE and XREF looks more like Unix

    jz and jnz are conditions jz zero, jnz not Zero

    EAX Extended AX register 32bit ax is 16 bit, I think.

    MIBovrd from CQRITE
    MIBovrd
    • re:

      yeah that's what the unknown language was compiled into, assuming that it is duqu code and not some random google image search for 'assembly'.
      sefsfse
      • And I wonder

        About the tech knowledge/skills that people say they have these days. Now wonder.

        Assembly needs to, well, at least for me it doesn't do it itself, get assembled into machine language. Kind of like what them that compiler things do for other types of language. Either that, or I should call Friday a bust, go get a haircut and a pint, not necessarily in that order.
        ego.sum.stig
    • I dunno man...

      If it were that simple, why wouldn't they recognize it?
      slickjim
    • Of course it is assembly.

      The question is what toolset created it. Each language and compiler have specific finger prints on how they compile source to assembly language. MSVC looks different than Borland C looks different than gcc. But all C compiled code looks similar. Objective-C looks different than C. Smalltalk has its own fingerprints. Even after these languages go through the compiler each has a unique look.

      Think of languages as a dog breed. Each different dog is a if different compiler. German Shepherds, while looking similar, each are unique. Shepherds look really different than collies but both are dogs.

      You answered the simple question. You identied "it is a dog". Now figure out the specific breed and the unique dog.
      Bruizer
    • asm

      It's not inline assembly as this is the constructor, it's returning so the constructor isn't inline.

      The only interesting thing to me is the short jumps, ie JNZ short somewhere, this is 32 bit assembly, I've never seen MSVC compilers to produce the short jump, I don't think a short jump really saves anything in a flat 32 bit memory model.

      Maybe a very old compiler?

      Of course the disassembler may be just putting that in there.
      TGGR
      • It will save a memory fetch.

        Both gcc and MSVC will employ short branches when possible.
        Bruizer
      • @Bruizer

        With optimizations enabled
        Tea.Rollins
    • You're dumb.

      1. That's the decompilation, of course it's assembler.
      2. Since you obviously don't know, decompilers convert from the raw instructions back into human-readable code, usually assembler, sometimes C or C# (see JustDecompile).
      3.The 16-bit allocation you're spotting is just the CPU spitting out SSE optimizations.
      4. That's not the droids they're referring to (did you really think they'd post even a margin of the real code on a blog?)
      Tea.Rollins
  • Of course it is assembly.

    Then why not say so? Thanks Bruizer, so got any positive ideas, thoughts? helpful comments, No?

    It's pretty standard assembler, as I mentioned. Could be GAS (Don't see dot notations before directives) more likely NASM but I don't see a _start but that could be elsewhere. Certainly not AT&T, definitely Intel, but which one? Inline gcc?
    MIBovrd
    • Not inline.

      What you start with is machine code. You then disassemble the code and what you get are just memory fetches and calls and compares and branches. You start assigning labels to thing you know like append, clear, count and such based on what the call does. Given the code is heavily event driven you may even get a branch table wih he method names but that depends on the compiler and language. For example, Obj-C will give you the method names but C will not.

      So we are looking at code that has been disassembled with labels applied to make it easier to understand. Chances are a basic MASM syntax is used for simplification.

      I do lots of low level embedded work (like device drivers) with various mixes of C and C++ on Borland C (legacy code), gcc and MSVC on PPC and Intel. For fun, I do Arm and gcc with Obj-C. You learn what gcc looks like and can how it compiles. Same with the others. This has finger prints of none of these. The calling conventions are non-conventional with parameters being assigned to different registers. Almost like hand coded assembly with object based programming techniques.

      The article was pretty clear we were looking at compiled code. If you follow the links through the discussions are quite keen.
      Bruizer
      • Over here, the crowd went wild (with delight)

        An actual, technical, thought through and pretty darned accurate post.

        You are now my one and only official favourite zdnet poster who I think genuinely has a brain and can write too.
        ego.sum.stig
      • obv is obv

        Excuse my rudeness but it obviously isn't inline because of two things. 1)The assembly listing is named class2_ctor and 2) the first instruction is an XREF or cross reference meaning the procedure is always called and never part of a basic block.
        chazzeromus
  • Bruizer Thank you

    Now I feel enlightened, thank you, just +'d you. Want to see more. Figure it out. I know YOU can!
    MIBovrd
  • I believe it was carefully crafted in DOS Batch and Edlin

    Two very obscure dialects. :/
    Dietrich T. Schmitz *Your
    • Sure it wasn't...

      Permissive Vbscript and Powershell backended by Ruby in a 3-way with LaTex as the interpretive medium?
      Tea.Rollins
      • Errm. I think you are onto something. :/

        nt
        Dietrich T. Schmitz *Your
    • For the correct answer you should contact Loverock Davidson

      He's always ahead of of everyone else.... just ask him :-)
      Over and Out
    • You are close ...

      Does Dos4Shell bring back any memories. I know for me it does, but nothing like the tride-and-true dos-hack-codes available through an edlin ... Other than this, I believe we are looking at a start-up process of GIMP Subsets-Routines, taking place in a Mint Environment Shell. It looks to be a TD_HDK process, which is similar to an RTOR_CHD or a CTOR_RHD. This AGaiN-Style level of programming is invoking a simple type of StuxnetDropper that is used to subdue the hardrive's MBA. Other language is a part of ATK+ and of course AIX melded into one. For beginners this is just a fix. But for a seasoned programmer, this would require a completely secure entry to a internal process handler InP ... but on a level of NeXT or a malformed-yet-working version of Ruby driven Puppy-linux. This is how this script was created. But, it is not a trojan, it uses a trojan code from a key-term known as Duqu. In ISOLINUX, the term "duqu" refers to a hard-drive drive assembly code re-written to simple terms ...
      crcgraphix
  • Stuxnet 2.x and Iran = Israel ?

    Almost have to think that Israel is behind this if it is that advanced (i.e. "State" and if it is likely to be disruptive technology aimed at Iran. Don't have any proof of that, but it would not surprise me. Let's hope it is pirmarily used for Good, not Evil ! ;-)
    jkohut