A Google Chrome experiment has gone horribly wrong this week and ended up crashing browsers on thousands, if not more, enterprise networks for nearly two days.
The issue first appeared on Wednesday, November 13. It didn't impact all Chrome users, but only Chrome browsers running on Windows Server "terminal server" setups -- a very common setup in enterprise networks
According to hundreds of reports, users said that Chrome tabs were going blank, all of a sudden, in what's called a "White Screen of Death" (WSOD) error.
The issue was no joke. System administrators at many companies reported that hundreds and thousands of employees couldn't use Chrome to access the internet, as the active browser tab kept going blank while working.
In tightly controlled enterprise environments, many employees didn't have the option to change browsers and were left unable to do their jobs. Similarly, system administrators couldn't just replace Chrome with another browser right away.
"This has had a huge impact for all our Call Center agents and not being able to chat with our members," someone with a Costco email address said in a bug report. "We spent the last day and a half trying to figure this out."
"Our organization with multiple large retail brands had 1000 call center agents and many IT people affected for 2 days. This had a very large financial impact," said another user.
"Like many others, this has had significant impact on our organization with our entire Operations (over 500 employees) working in a RDS environment with Google Chrome as the primary browser," said another system administrator.
"4000 impacted in my environment. Working on trying to fix it for 12 hours," said another.
"Medium sized call center for a local medical office lost a day and a half of work for 40-60 employees," added another.
"Same issue experienced, hundreds of users impacted - hours spent attempting to isolate the cause," said another user.
Hundreds of complaints poured in via Google's support forum, Chrome bug tracker, and Reddit [1, 2]. One impacted sysadmin told ZDNet that they initially mistook the Chrome blank tabs as a sign of malware and reacted accordingly, starting network-wide security audits.
However, with time, the root cause of the bug was eventually found, and traced back to a feature called "WebContents Occlusion."
According to Google Chrome design document, this is an experimental feature that suspends Chrome tabs when users move other app windows on top of Chrome, treating the active Chrome tab as a background tab.
The feature, meant to improve Chrome's resource usage when not in active use, had been under testing in Chrome Canary and Chrome Beta releases all year.
However, this week, Google decided to test it in the main Stable release, so it could get more feedback on how it behaved.
That it behaved badly is an understatement.
"The experiment/flag has been on in beta for ~5 months," said David Bienvenu, a Google Chrome engineer. "It was turned on for stable (e.g., M77, M78) via an experiment that was pushed to released Chrome Tuesday morning."
"Prior to that, it had been on for about 1% of M77 and M78 users for a month with no reports of issues, unfortunately," he added.
However, when rolled out to a broader audience -- such as Windows users on terminal server setups -- an unexpected bug occurred that instead of suspending Chrome tabs when users switched to another app, it unloaded the tab entirely, leaving a blank page behind.
Users could refresh the Chrome tab to access their sites again, but in some cases, this also meant they lost previous work.
The Chrome team said they pushed a new Chrome configuration file to all Chrome users and disabled the experiment.
Chrome engineers operate a system called Finch that lets them push updated Chrome settings to active installs, such as enabling or disabling experimental flags.
If the fix has not reached all impacted users, and they still have problems, they can disable the following two experimental flags by hand:
An alternative method to fixing this is to start Google Chrome with the following command-line argument: --disable-backgrounding-occluded-windows
However, fixing the problem actually made system administrators even angrier. Many didn't know that Chrome engineers could run experiments on their tightly-controlled Chrome installations, let alone that Google engineers could just ship changes to everyone's browsers without any prior approval.
"Do you see the impact you created for thousands of us without any warning or explanation? We are not your test subjects," said an angry sysadmin. "We are running professional services for multi million dollar programs. Do you understand how many hours of resources were wasted by your 'experiment'?"
"How many tens of thousands of dollars has this oops cost everyone? This is starting to look like a pretty massive mistake on Googles part," added another disgruntled sysadmin.
"We take great care in rolling our changes out in a very controlled manner to avoid this type of scenario and we spent the better part of yesterday trying to determine if an internal change had occurred in our environment without our knowledge. We did not realize this type of event could occur on Chrome unbeknownst to us. We are already discussing alternative options, none of them are great, but this is untenable," said another, hinting at a browser change across their organization.
Although it lasted just two days, this entire incident is panning out to be one of the Chrome team's biggest bungles. Many impacted users demanded an official apology from Google, and by the looks of the financial impact it may have caused some companies, they are entitled to it.