For more than 20 years, there has been a significant market for fault-tolerant servers. When businesses started to automate, it pretty soon became clear that some applications were so important that the company could not afford to have them fall over.
If you have to reboot a Las Vegas casino, it can cost $10m. If you have to reboot a stock exchange, it can cost $100m.
Typically, fault-tolerant systems use twice as many processors as the application needs, but runs them in lock-step. The same answers should come out of both halves of the system at all times. Any disagreement, and it is obvious one board had failed. A quick look at some checksums, and the system kills the one that failed, and alerts the IT manager with a request for a replacement board.
This made for expensive systems (think twice as much as a regular one), that could only be justified by very valuable applications. In the early days, the industry was prepared to support variegated hardware and software. Suppliers, such as Tandem (founded 23 years ago) and Stratus (founded 25 years ago), chose the processors they wanted and created their own hardware and software for the purpose. The systems sold well for jobs like running banks and telecoms services.
Things are different now. Hardware has become more uniform: there are a limited number of Risc processors with a future, and the majority of servers use Intel and standard operating systems. At the same time, conventional servers have become more reliable -- think of IBM's eLiza initiative, and the growth of clusters with failover.
Users would have to be very convinced of the merits of a fault-tolerant system to pay over the odds and buy a non-standard system. The server might run non-stop, but what if the supplier went out of business, or the range got canned? Whether the systems were from niche-y vendors, or from outposts of big companies, their future has looked unsure, and the market for those fault-tolerant systems has become less obvious.