Are SATA port multipliers safe?

Are SATA port multipliers safe?

Summary: SATA port multipliers are cheap and popular for low-cost storage arrays. But are they safe for your data? ZDNet reader experience can help us size the problem.

SHARE:

SATA disk drives are normally in a one drive/one SATA controller port configuration. But in recent years, a new approach, known as the port multiplier, has extended this connectivity to multiple drives.

Researchers Peng Li and David J Lilja of the University of Minnesota, and James Hughes and John Plocher of FutureWei Technologies reported on SATA port multiplier behavior in a poster (PDF) presented at FAST '13. They conclude that port multipliers work well when the disks are working well — but not so well when a drive fails.

Inducing disk drive failure

Their first problem was figuring out how to induce a disk drive failure that would look like a normal disk drive failure. Simply disconnecting or powering down a disk drive happens too quickly.

Their solution was to remove the cover from the disk drive while it was under load. This typically resulted in the drive's failure within 3 to 4 minutes. They tested both Seagate and Western Digital hard drives, in both enterprise and consumer versions.

The researchers tested drive failures on a system running Linux with two SATA controllers. In the first testbed, there was one drive connected to each SATA controller. In the second testbed, there was one drive connected to one SATA controller and a port multiplier with two drives on the other SATA controller.

Results

In the first setup, with no port multiplier, the failure of one drive had no impact on the other drive on the system. The test workload, the fio program, always completed.

In stark contrast, when a drive was failed on the port multiplier, the second drive on the port multiplier would also fail without completing the fio workload. This was true on both Seagate enterprise and Western Digital consumer drives.

The Storage Bits take

This research is not conclusive, and the authors hope to do more. Only a small number of drives on a single Linux platform were tested.

But it suggests that caution is in order. Using RAID software across a port multiplier array may result in an unrecoverable failure when a single drive fails.

It is possible, using advanced erasure coding or a high-end file system like Gluster, to use a large number of disk drives on port multipliers in such a way that even several failures will not compromise data integrity or availability. But this is not something the average SOHO user could implement.

Because disks are marvels of engineering and precision manufacturing, many people will have a port multiplier where no drive fails for years. But when one does, it could be brutal.

This points to a larger issue in IT: We have few independent sources of underlying technology evaluation. We are all guinea pigs.

Comments welcome, as always. Have you experienced a disk failure on a port multiplier? Please share what you learned.

Update: Below is a video of a running drive being taken apart. A rough process, but how else can you create a head crash on demand?

Topics: Storage, Servers, Disaster Recovery

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

7 comments
Log in or register to join the discussion
  • Lots of missing data

    This all looks like made up "test". Were different models tested? Were different controllers tested? Were different drivers and or operating systems tested?

    Only one OS, controller, driver, model... So much about science.
    danbi
  • The presenters were clear on the limits of their work

    Science has to start somewhere - and it needs to be funded. When this was presented at FAST 13, they said that this was just a start. Much more work could be done to understand the root cause of the observed failures.

    But since anti-education, anti-science conservatives continue to cut education and research funding at the state and federal levels, we don't have the financial support for research that we need for many topics, SATA port multipliers included.

    Robin
    R Harris
    • It depends on... I developed port multiplier-aware driver and...

      ... and while what was described can happen, a lot depends on the underlying driver and the port multiplier box design.

      I am somewhat familiar with CERTAIN (by far not all!) Linux implementations of port multiplier-aware driver - the ones in the open source repositery tend to be less robust than the ones of closed source or ones written for Mac OS or Windows. A command sent to a drive attached to a port multiplier is very much like a command sent to a drive attached to a "regular" port. Except a tiny bit of information found in the so-called "Frame Information Structure" or FIS. That information is merely the number of the corresponding port and for instance a FIS for a drive attached to the port zero of a port multiplier is EXACTLY the same as a FIS for a drive attached to a "regular" port. That's why even controllers / drivers non-aware of Port Multiplier can address the port zero of the PM, it's 100% transparent.

      Now the problems will start indeed if:

      - a drive fails
      - more drives are attached to the PM
      - the recovery / failure path of the driver wasn't tested / worked out properly

      Here one does not even need to destroy a drive - for a driver with recovery issues it is often enough just to force-remove one drive from the port multiplier and the rest will stop working normally.
      On the other hand well-known controller + port multiplier + commercial driver bundles do not display that problem. The tests made under Linux running on Marvell 88F6282, using an undisclosed port multiplier and presumably the standard open-source Linux SATA stack are very much irrelevant for a G5 or MacPro featuring some well-known port multiplier-aware controller + driver "combo" inside and a well-known port multiplier box outside, powered by Silicon Image 3726 port multiplier ASIC.

      I wonder, why that set-up wasn't tested THAT way - or, say, under Windows 7 including drivers featuring Windows logo and verified by Microsoft.

      With the same effort we can say, computers in general can harm your hard drive because there is out *** some *** computer powered by *** some *** OS and *** some *** controller... and it failed somehow. And from that sentence most critical data omitted.
      gyft
  • Lots of missing data

    This all looks like made up "test". Were different models tested? Were different controllers tested? Were different drivers and or operating systems tested?

    Only one OS, controller, driver, model... So much about science.
    danbi
  • Great Article

    Real tech information. This is the ZDnet I could not get enough of back in the 90's (ZD TV).
    happyharry_z
  • Linux, The Hacker's Toolkit

    Only on Linux could you mess about with stuff like this. And not just investigating failures, but techniques for coping with them.
    ldo17
  • permanent failure or a hiccup?

    "when a drive was failed on the port multiplier, the second drive on the port multiplier would also fail without completing the fio workload"

    Does the 2nd drive fail permanently, with permanent data loss? Or just dies until reconnected to a direct SATA port?

    If it's Scenario 2, what are the real-world chances of data loss on a parity array with disks being its members?

    (That's the real question the article is trying to answer, isn't it?)
    Alex Gerulaitis