Linux kernel head Linus Torvalds has trashed a patch from Amazon Web Services (AWS) engineers that was aimed at mitigating the Snoop attack on Intel CPUs discovered by an AWS engineer earlier this year.
The so-called 'Snoop-assisted L1 Data Sampling', or Snoop (CVE-2020-0550), attacks affecting a range of Intel Xeon and Core CPUs were disclosed in March.
AWS engineer Pawel Wieczorkiewicz discovered a way to leak data from an Intel CPU's memory via its L1D cache, which sits in CPU cores, through 'bus snooping' – the cache-updating operation that happens when data is modified in L1D.
In the wake of the disclosure, AWS engineer Balbir Singh proposed a patch for the Linux kernel for applications to be able to opt in to flush the L1D cache when a task is switched out.
"This protects their data from being snooped or leaked via side channels after the task has context switched out," Singh explained in April. The patch was intended to ship with Linux kernel version 5.8.
The feature would allow applications on an opt-in basis to call prctl(2) to flush the L1D cache for a task once it leaves the CPU, assuming the hardware supports it.
But, as spotted by Phoronix, Torvalds believes the patch will allow applications that opt in to the patch to degrade CPU performance for other applications.
"Because it looks to me like this basically exports cache flushing instructions to user space, and gives processes a way to just say 'slow down anybody else I schedule with too'," wrote Torvalds yesterday.
"In other words, from what I can tell, this takes the crazy 'Intel ships buggy CPU's and it causes problems for virtualization' code (which I didn't much care about), and turns it into 'anybody can opt in to this disease, and now it affects even people and CPU's that don't need it and configurations where it's completely pointless'.
"I don't want some application to go 'Oh, I'm _soo_ special and pretty and such a delicate flower, that I want to flush the L1D on every task switch, regardless of what CPU I am on, and regardless of whether there are errata or not. Because that app isn't just slowing down itself, it's slowing down others too."
Torvalds' reference to virtualization was directed at AWS which, like other cloud providers, sells virtual CPUs often with simultaneous multithreading (SMT) enabled.
He goes on to point out that with SMT enabled, it "should disable this kind of crazy pseudo-security entirely, since it is completely pointless in that situation".
"At a _minimum_, SMT being enabled should disable this kind of crazy pseudo-security entirely, since it is completely pointless in that situation. Scheduling simply isn't a synchronization point with SMT on, so saying "sure, I'll flush the L1 at context switch" is beyond stupid," he said.
In a discussion with Singh, Torvalds notes that to "'flush L1D$ on SMT' is crazy, since an attacker would just sit on a sibling core and attack the L1 contents *before* the task switch happens".
Herrenschmidt admitted that the patch is pointless with SMT, but urged kernel developers not to "throw the baby out with the bath water" and contested the argument that the patch was because AWS wants to sell hyper threads as virtual CPUs.
"Not necessarily and not in every circumstances," Herrenschmidt wrote. "Yes, VMs will typically have SMT enabled. This isn't targeted at them though. One example that was given during the discussions was containers pertaining to different users.
"Another example would be a process that handles more critical data such as payment information, than the rest of the system and wants to protect itself (or the admin wants that process protected) against possible data leaks to less trusted processes.
"AWS has more than just VMs for rent :-) There are a whole pile of higher level 'services' that our users can use and not all of them necessarily run on VMs charged per vCPU."
Herrenschmidt said the patches aren't trying to solve problems happening inside of a customer VM running SMT and nor are they about protecting VMs against other VMs on the same system.