Working out who is behind a cyber attack is one of the hardest parts of dealing with any security incident - and it's getting a lot harder.
While it might be all but impossible to bring hackers to justice, especially if they are in another country or even working for a foreign government, understanding who the attack is coming from is often the key to defending against it.
That's why organisations under attack have to care about identifying the intruders, says Mike Oppenheim, senior threat intelligence analyst at security company FireEye, even if that identification never leads to an arrest.
"It's good for them to know who is targeting them, because it helps them in business decisions and it also helps them tailor their net defence," he says.
Fortunately for investigators, even the stealthiest of hackers will leave behind at least some traces that can be used to identify them.
In this way, a hacking attack is just like any other crime scene: investigators look for entry points, victims, and the fingerprints of the criminals - and try to work out who has the most to gain from the incident.
Who they target first, the tools they choose, and the language they use can all help investigators to slowly piece together a picture of who is responsible.
First, the victims. Some attacks rely on infecting an innocent website with malware, which then infects the PCs of people visiting it: these people are the real target of the attackers. Others start with a phishing email, which tricks recipients into clicking on a malicious link. Through the types of websites infected (for example, medical research journals), investigators can work out who it is the hackers want to focus on - such as managers in pharmaceuticals companies. Similarly the content of the phishing emails can provide a clue as to who the attackers are after.
Next, the weapons. Researchers can learn plenty from the malware used, and whether it can be linked to any other attacks or to a particular group. "If they are developing custom tools, and they are sophisticated and modular, it would indicate they might have a professional group behind them and they may even be state-sponsored," said Alan Neville threat intelligence analyst at security company Symantec.
The code itself can also include some clues: the writers may leave comments in their code which can give an indication of the language they speak. Security companies also look at the time stamps, which show when a particular piece of code was compiled, which can give them a good indication of the working day of the writer. Between the language and the work schedule, researchers can get an idea of where the attackers may be based in the world.
For example, when investigating the 'Dragonfly' hacking campaign against energy companies in the US and Europe, researchers at Symantec could work out the times of day when the malware was being developed. "We were able to see it fitted into a 9-to-5 workday schedule for the UTC +4 Eastern Europe time zone, which is an indication of where it may be from," said Neville.
Then, there's the crime scene itself: another area to look at is how the attacker behaves on the network, for example, whether they have any habits like storing malware in the same directory of an infected PC every single time.
FireEye's Oppenheim says that several hacking groups his company tracks will always run the same three or four basic Windows commands in a specific order "once they land on a victim". He also watches for other habits that can identify an attacker: "What tools do they use to crack passwords on a particular system? How do they exfiltrate data? How do they move internally on a network? How do they search for the information they want on the network?"
And finally, there's the getaway: nearly all malware has to call home to a command and control server at some point, either to get more instructions or updates or to smuggle out stolen data, which can provide another lead. "If you have a server on the internet, somebody has to pay for it or register a domain and in the process you leave traces," says Bob McArdle, manager of Trend Micro's forward-looking threat research team.
Put all of this together and you can make a case for attribution. For example when the FBI set out evidence tying North Korea to the hacking attack on Sony Pictures Entertainment it pointed to links to malware previously developed North Korea including "similarities in specific lines of code, encryption algorithms, data deletion methods, and compromised networks", and links between the infrastructure used in this attack and other "malicious cyber activity" linked by the US government to North Korea. It also noted that IP addresses associated with "known North Korean infrastructure" communicated with IP addresses that were hardcoded into the data deletion malware used in the attack.
McArdle said that while attackers will try to hide traces they often leave enough behind to help investigators follow a trail.
"Bad guys will try and cover their tracks but when you investigate the network side, you can expand that out and find all their servers and hopefully they messed up somewhere, like on one of the servers they're hosting a personal site or something along those lines. You have to get a bit lucky, but if you do, it can give you those hints like an email address that you can start trying to link back to a person."
Sometimes attackers can get sloppy, he says. "People do silly things like use their own email address to register a site - then the investigation is quite quick."
But it's not always the case. Part of the problem is that many of these common indicators are easily changed by hackers who want to throw investigators off the scent. And increasingly they are doing just that.
"It's likely that a lot of groups would take the general indicators that most security companies look for and modify them. We've seen them change time stamps or remove them altogether so we can't identify when the malware was created or compiled, or to modify them so they look ridiculous," said Symantec's Neville.
As organisations get more serious about tracing hackers, so hackers are deliberately messing up the crime scene to throw suspicion on others.
Some groups attempt to disguise their advanced malware as a more common virus to prevent closer inspection, while others throw in red herrings to divert attention: for example, one piece of malware was found to include the Hindi word for 'error' plus Chinese and Farsi words - and almost certainly had nothing to do with hackers speaking any of those languages.
"People can plant false flags: a Russian coder could put a whole lot of Chinese strings in just to make you think, 'If there is Chinese language in here, it's probably a Chinese author'. It's very easy for them to do that stuff as well, which they're starting to do more and more, especially in the targeted attack area," says Trend Micro's McArdle.
It's not just in the code itself that hackers are tightening security, but also in their command and control infrastructure.
"It's definitely getting harder - attackers are trying to stay one step ahead. They are well aware of what security vendors do," says Symatec's Neville. He gives the example of a command and control server that his team were able to examine after working with law enforcement to seize it. The attackers had removed all traces of their access to the machine apart from one encrypted and thus inaccessible archive containing the information they had stolen. "That's an example of good operational security from an attacker's point of view. It would suggest a professional keen to hide the tracks of where they were coming from," he says.
A recent attack on security company Kaspersky Lab shows how serious intruders are about protecting their own secrets.
Kaspersky was hit with an updated version of the 'Doku' malware first seen in 2011. The company discovered a months-long attack on its systems, aimed at spying on its most advanced research, when it tested a new antivirus product on its own network.
But even for a security company it's still hard to be sure who is behind the breach. "Of course I was curious about attribution, and of course we did our best to find out who these guys are - very professional - so I can't point a finger," the company's CEO Eugene Kaspersky told ZDNet. "We are not police. We are not secret service. Of course we have some ideas, but we cannot prove [it]," he said.
The company estimates it could have cost $50m to build and support the 'Doku 2' malware used, and its analysis shows how complicated the malware was, and how hard it is in practice to work out where such attack is really originating from.
When connecting with the command and control servers, the Doku 2 malware hides the traffic as encrypted data appended to a harmless JPEG or GIF image file, making it harder to track where data is being sent. It also uses multiple proxies and jumping points to mask where the controllers are really based. "This makes tracking an extremely complex problem," said the company's analysis notes.
On top of this, the attackers have included false flags throughout the code, designed to send researchers in the wrong direction.
For instance, one of the drivers contains the string "ugly.gorilla" - likely to be a reference to an alleged Chinese hacker, while at a less obvious technical level the company said the use of a particular cipher previously seen in malware associated with China-based group was another red herring planted by the attackers. Other attempts to throw investigators off the scent include a reference to "romanian.antihacker" which may be trying to point the finger at eastern Europeans.
The company's analysis notes that the original Doku attackers seem to work in the GMT+2 or 3 time zone and adds "logs collected from some of the proxies indicated the attackers appear to work less on Fridays and didn't appear to work at all on Saturdays, with their regular work week starting on Sunday," implying malware writers based in Israel.
But in contrast the new attack - based on an updated version of Doku - leaves much less for investigators to work with, as the attackers faked all the timestamps, removed the debug paths, and internal module names for all plugins, making attribution much less clear cut.
Because the same malware was found on the networks of hotels which had hosted negotiations with Iran about a nuclear deal, it would suggest that it is an Israel-backed group again - but that's far from certain.
"Most probably it was made by the same people or they shared the code with other organisations or nations," said Kaspersky.
Chasing hackers makes for good stories and security companies like to publicise their work, and the behaviour of a group after its actions have been publically revealed can be instructive: some groups dismantle their operations immediately, others disengage slowly and others ignore it all together, as a study by Thomas Rid and and Ben Buchanan Attributing Cyber Attacks notes: "Intrusions from China, if often less advanced technically, tend to be unusually persistent, even after an attribution report uncovered sensitive details about an operation."
But what is certain is that gangs monitor the information that security companies publish and switch methods in response to publicity.
When Symantec published details of the malware used by the Waterbug group that had been targeting government agencies and embassies, the group switched to a different sort of malware, "so they're keeping an eye on what security vendors are posting about them," Neville said.
Oppenheim has a similar story - after outing a Chinese hacker group it calls APT 12, "within days, this group changed their malware and changed the command and control infrastructure," he said.
He adds: "There is always going to be a back and forth between the operators - those who are conducting these attacks and those on the defence side. The offence is always going to have the upper hand because they are going to know the play they are going to run. On the defensive side, you are always going to be reacting."
Placing a few scraps of language inside a piece of malware to point investigators in the wrong direction is one thing, but recently there has been yet another escalation where hackers are apparently deliberately and explicitly pointing the finger at another group.
An attack on French TV station TV5 Monde took its channels off the air for hours in April while the group which claimed responsibility - calling itself CyberCaliphate - posted pro-Islamic State messages to the broadcaster's social-media accounts.
However, researchers now believe that it was a Russian state-sponsored group that was behind the attack, and that it's not the first time they've behaved in this way.
"We attribute that to APT 28 which is Russian. We've seen this group do this before, this false flag type thing, where they try and operate as if it is - in this instance ISIS - to throw people off the scent, but also it hides the true intention and the reasons for being there," said Oppenheim.
"We've seen this before in other places and the TV5Monde stuff is one example that's come out publically. It's a regular tactic of theirs in the last couple of years."
Why a Russia-backed group would behave in such a way is not clear - but what is certain is that such attacks are likely to become more common in future, making it even more vital to work out who is really behind such an attack.
What is certain is that attribution remains difficult, if essential. "It's always hard to tell, especially with nation state actors. It's hard to get to that full attribution point and even once you've laid out your evidence, you're never going to have 100 percent certainty," conceded Oppenheim.
It's not just a problem that the private sector is tackling alone: the US Department of Defence has focused on attribution as an important element of deterring cyber attacks and has invested in this area, seeing it as an element of broad cyber defence. Other US agencies have made similar investments: for example when the FBI said the attacks on Sony had come from North Korea it said it had used "sensitive sources and methods" to come to that conclusion.
But making the leap from technical attribution - what the server logs tell you - to naming an individual is still incredibly hard, as Eugene Kaspersky points out. "You have to surround the person who is behind the keyboard," he says - no easy task when the attacks could be coming from anywhere in the world.
And while it may become harder to point the finger at an attacker - or at least take a lot longer, even the best trip up sooner or later, says Trend Micro's McArdle: "In our experience people do eventually make mistakes. It can take a long time. Sometimes we might monitor their setup for years even, and every time you see them create a new malware or a new domain you keep monitoring it - and hopefully they make one mistake."
More stories on surveillance and cybercrime
- Inside the secret digital arms race: Facing the threat of a global cyberwar
- Surveillance laws need rethink, but bulk collection of web data will continue
- The undercover war on your internet secrets: How online surveillance cracked our trust in the web
- The impossible task of counting up the world's cyber armies
- Encryption: More and more companies use it, despite nasty tech headaches