X
Innovation

NASA is fixing a computer glitch on a giant telescope in space. That's just as hard as it sounds

The space agency has finally found what is playing up on its space observatory and is getting ready to fix it.
Written by Daphne Leprince-Ringuet, Contributor
hubble-telescope.png

For the past month, Hubble's instruments have been offline and the telescope hasn't been taking in any data.   

Image: NASA / The Space Telescope Science Institute

After more than a month of testing and trying, NASA's engineers have finally diagnosed the source of an ongoing glitch on Hubble, the 31-year-old giant telescope that is currently orbiting in space almost 600 kilometers away from Earth. 

On 13th June, Hubble automatically placed all of its science instruments on standby as a security measure following the failure of the telescope's payload computer – one of the central systems that controls and coordinates the instruments onboard the spacecraft, and transmits science and engineering data to the ground.  

SEE: An IT pro's guide to robotic process automation (free PDF) (TechRepublic)

This means that for the past month, Hubble's instruments have been offline and the telescope hasn't been taking in any data. The space observatory was launched in 1990 to look at the universe's stars and galaxies, and in normal times it transmits about 150 gigabits of raw science data every week. 

NASA's engineers have been working at pace, therefore, to restore the telescope's instruments, but remotely identifying the exact source of the problem has proven somewhat of a challenge. 

The payload computer sits in a specific unit within Hubble, called the Science Instrument Command and Data Handling module (SI C&D), which is responsible for synchronizing all of the science systems on Hubble, as well as processing, formatting, storing and transmitting data to NASA's Earth-based team. 

The engineers determined early on that the problem was not directly tied to the payload computer, but was rather caused by one of the components in the wider SI C&D. They have now found that the culprit is likely to be the Power Control Unit (PCU), which ensures a steady voltage supply to the payload computer's hardware. 

The PCU is made of both a power regulator that provides a constant five volts of electricity to the payload computer, and of a secondary system that checks the voltage levels leaving the power regulator and tells the computer to stop operating if there is any problem with the amount of electricity supply. 

Two scenarios have now emerged: either the power regulator is sending the wrong voltage levels, causing the secondary system to close down the payload computer; or the secondary system itself is glitching and unnecessarily keeping the computer offline. 

It took NASA some time to reach this conclusion. The organization's scientists initially thought the issue was to do with the payload computer itself, and tried switching on the backup for the device, only to be faced with the exact same symptoms. 

This is when the team realized that the source of the glitch would have to be found in another hardware component in the SI C&D. But discovering which component exactly, at a distance of several hundreds of kilometers, amounted to finding a needle in a haystack. In space.

SEE: NASA is using data science to fill its data science skills gap

Unlike if Hubble were sitting in a lab, NASA's engineers were not able to play around with components to test different hypotheses. Instead, they had to patiently work through every possible root cause, sending specific commands to the telescope to check whether it was responding normally, until they found the problematic component.  

Although Hubble's team has now identified the PCU as the source of the glitch, it is only the start of the fix. It is impossible to reset the component using ground commands, meaning that NASA's team will have to switch to the backup side of the SI C&D module that contains the backup PCU. 

This is a complex and risky operation, because it is likely to impact several other hardware boxes on the spacecraft that are also connected to this particular side of the SI C&D module. 

"Every time we swap components in the operational chain, we treat it as a big deal," Paul Hertz, NASA's astrophysics division director, told ZDNet. "We want to make sure we do it correctly, that we think of all the possible consequences of making that change, that we send the correct commands up to the spacecraft, so that it does the swap over in the correct and safe manner." 

SEE: This giant space telescope has a mystery computer problem. Here's how NASA engineers are trying to fix it

Since the start of July, Hubble's team has been preparing for the swap. The process has involved preparations for a test of procedures, followed by multiple days of testing procedures, as well as a review to assess all the risks related to switching to backup hardware. 

"We have well-established processes where the team first looks at the procedures, makes sure they are correct and don't require updates, then develops the upload, which is reviewed by an independent team that gives the go or no-go decision," said Hertz. "No matter how simple or complicated the change is, we have a thorough process to make sure we are careful." 

The switch will be carried out over the next few days, and NASA's engineers are hopeful that the operation will allow Hubble to resume its normal scientific observations as soon as possible. 

Hubble's team's confidence partly stems from the fact that it is not the first time that the space observatory has required a long-distance fix. Back in 2008, the telescope's SI C&D experienced a failure related to a different component, which stopped the system from transmitting information to Earth.  

The backup for one of the SI C&D's sides was successfully switched on and Hubble rebooted, and the first picture the observatory released after a month-long pause showed one galaxy passing through the heart of another, 400 million light-years from Earth. Something to look forward to in the next few weeks, once Hubble gets up and running again. 

Editorial standards