When supercomputing and AI meets the cloud

Ireland is getting a new supercomputer; Penguin Computing CEO Tom Coull on how supercomputing is taking on some big computing challenges.
Written by Colin Barker, Contributor

The Irish Centre for High-End Computing (ICHEC) is preparing for the installation of a new national supercomputer which will also be accessible via the cloud to researchers across Ireland.

Linux-based cloud and high-performance computing company Penguin Computing is one of the companies involved with the project along with Intel. ZDNet talked to Penguin CEO Tom Coull to find out more.

ZDNet: Tell me a little about your company.

Coull: We are a platform and performance scale-out company and we have been around for almost 20 years. We're based in Fremont, California, and we have manufacturing here.

We got into performance computing in 2003, with the acquisition of a software company that gave us the software to be able to support Beowulf-style clusters. That was Scyld Software, and, in fact, our software is under the brand name of Scyld.

We then went into the high-performance computing (HPC) market. We were initially a Linux server company and had really gone into business to provide LAMP [Linux, Apache, MySQL, PHP] servers back in 1999 when that was really a growing part of the IT landscape.

You can think of our business as two-thirds high-performance computing and one-third datacentre scale-out. We also have a high-performance computing cloud called POD (Penguin Computing On Demand).

POD plays an important role here and it's good to know we have this special, high-performance cloud.

SEE: Special report: How to automate the enterprise (free ebook)

Then we were acquired about three months ago by SMART Global Holdings, which is a public company.

More specifically, we've partnered with Intel to provide a supercomputer for ICHEC. It's just in the process of installation -- our team flew over to Ireland and we started installation last week.

It's an interesting system. You can think of it as a mirror image of our current platform. It is an Intel Xeon-based topology, but it also runs a virtualised front-end, which is the same thing that we do for our POD.

It provides an on-demand, user space where people can have their own log-in nodes that can run virtual desktops to be able to do per-input processing. And that is interfaced back to the actual supercomputer which runs bare metal. There is a job scheduler that runs the jobs out onto that platform.

The aim is to be able to provide a really broad offering for ICHEC's users and customers. The cloud platform that we provide -- both as a service and as a product -- allows them to do that.

The product is called Scyld Cloud Manager and it uses a virtual desktop technology called Scyld Cloud Workstation which gives you either a Windows or a Linux interface, depending on the devices that you happen to have over the Internet and it links back into the supercomputer.

You must have all kinds of customers from all kinds of industry running these systems?

That's exactly right. The most common are customers that generally have applications that scale out. They run at scale and they need a hardware platform to run at scale. You can't really run these applications on any meaningful scale on a virtualised environment like the more typical public clouds.

A lot of weather modelling, structural analysis, computational fluid dynamics and the other typical modelling outfits that are out there.

A lot of custom code is run on these systems along with a lot of AI which is a really fast-growing area of computing.

AI must be becoming a very important area?

It really is. It is still in the early stages; a lot of our customers are really in the research mode. But we have a lot of federal business as well, not only in the US but around the world including the UK. And a lot of other areas where there are a lot of intelligence programs going on and where they have been using AI for quite a while.

Those are continuing to grow but we are seeing a lot going on with autonomous vehicles -- a lot to do with feature extraction from images and from video.

You mentioned autonomous vehicles, where do they come into this?

Well, autonomous vehicles have to do feature extraction on the fly. So, basically, as the cameras are sending data and recording it back and sending data to a GPU-based environment that pulls out images and tracks them and tries to see where the car is and what might be approaching -- whether it's a person, a piece of paper, a signpost, etc.

And then you have this whole secondary area. As you are driving the car around, it's using algorithms to help it navigate and it can always be improved. The cars are making mistakes. It can be improved but you really want to get the video off the car then sent to some collection area and then sent back to your datacentre. There you are running your deep-learning machines and those are the things that are looking at the car's performance and bringing out algorithms and asking, how can we make it safer?

That's kind of where AI kicks in and it's a massive problem because of the amount of data a self-driving car generates which is very, very large.

Getting that data, in some form, back over to the data centre is a challenge for companies.

And if you give that problem to a mainframe, that's got memory and performance capacity to spare, right?

Exactly. The ICHEC environment has both Intel and also Nvidia v100 graphics CPUs. So it is a nice environment to be able to do research in the area of deep learning using Nvidia's GPUs.

Could you take me through the ICHEC implementation?

To go through the details, Penguin worked there in collaboration with Intel -- Intel is the prime contractor and most of the gear is provided by Intel.

Some of the hardware is Penguin servers, particularly in the area of the GPU accelerator servers. We built the system in our factory to test it using Intel's Omni-Path technology and we partner with DDN (Data Direct Networks).

We built the system in our factory to make sure that it was running and put some of our software and systems on it and then packed it in large shipping crates and shipped it over to ICHEC and providing local support from people that we have locally.

And this is all part of our expanding footprint in the UK and Ireland.

Many people say that the mainframe and HPC in general is a declining market, do you think it might still be a growing market?

It is. Worldwide it continues to grow and, you know, you can think of it as characterised into two main groups.

As a bit of background, we have a contract with the Department of Energy here in the States. They had this characterisation of two types of machines which I think is useful.

At the extreme high-end -- the big supercomputers that you read about in the news -- one minute China has the fastest, and then the US has the fastest and so on. Those are really advanced supercomputers. They're specialised with specialised hardware interconnects. They are custom-built to solve very difficult problems extremely fast. They have very large memory and often they are liquid-cooled -- we have a liquid-cooled version of our computers also.

SEE: Cloud v. data center decision (ZDNet special report) | Download the report as a PDF (TechRepublic)

So those sorts of advanced systems -- supercomputers -- are one category. And then there is another category, which you think more of as commodity technology. They are more in the area of the democratisation of HPC where you want to be able to provide that technology out to the broadest group of people possible. Many of the next tier down of supercomputers are of that category.

That's where there are a whole bunch of standard Intel Xeon or other processor servers that are connected together with fast interconnects like InfiniBand or Omni-Path hooked up to a parallel file system.

How many mainframes do you have running?

That's a big number. In any given year we have about 350 customers running and that ranges from small orders up to very, very large orders.

There are a lot of pen Penguins. Most of them are in the US. We do have them in Ireland, the UK, other parts of the EU and, in fact, we are running them in 50 different countries all over the world.

More on mainframe computing and supercomputing

CA Technologies, IBM forge mainframe DevOps, cloud pact

IBM and CA Technologies are combining hardware and software forces to bring more applications and cloud connections to the mainframe.

Is your organization a champion of digital disruption -- or just an observer?

Less than one quarter of organizations understand that a commitment to digital is at the heart of true transformation and those organizations are reaping rewards of digital disruption.

NAB hit by mainframe outage

National Australian Bank confirmed that it suffered a nationwide systems failure over the weekend as a result of power being cut to its mainframe.

IBM launches 'skinny' Z mainframe designed for 19-inch standard data center rack

IBM is rolling out a "skinny" mainframe that has a 19-inch industry standard, single frame design and be easier to integrate into cloud data centers.

Raspberry Pi supercomputers: From DIY clusters to 750-board monsters (TechRepublic)

The Pi clusters that push the $35 board to its limits.

Making the mainframe relevant in the world of agile development and DevOps

It's four years since the mainframe software company Compuware went from public ownership to private and now the company believes it is ready to reinvent the mainframe software market.

IBM banks on strong history for success with AU$1b government contract

The Australian federal government last month announced it had signed a AU$1bn contract with IBM for hardware, software, and cloud-based solutions across all of its departments and agencies.

Summit is the world's most powerful supercomputer (CNET)

IBM's newest supercomputer Summit is eight times more powerful than its predecessor.

Editorial standards