Tilera, a chip designer spawned from research at the Massachussets Institute of Technology, has its sights on Intel's lead in cloud-computing hardware.
Tilera hopes its novel chip architecture, which puts up to 100 Risc-based cores on a processor, will push it ahead by avoiding the latency bottlenecks that plague Intel's x86 designs.
Founded in 2004, the Silicon Valley-based company has brought out three generations of processors. These have built on MIT alumnus and chief technology officer Anant Agarwal's work, which includes a chip with 16 cores on one die and a mesh networking architecture that avoids bus bottlenecks. The current generation of Tilera processors, the Tile-GX family, scales between 16 and 100 cores and is tailored to cloud-computing applications.
The company feels it has an edge on Intel thanks to its iMesh on-chip networking architecture, which allows for low on-chip latency in passing messages between the cores and promises better power efficiency. Tilera's head of marketing, Ihab Bishara, sat down with ZDNet UK to talk about what Tilera is doing to step its chips up from embedded applications, such as networking and video compression, into cloud computing.
Q: What kind of tasks best fit processors with so many cores?
A: We provide anywhere between 16 and 100 cores. If you look at the markets we're after — networking, multimedia and servers — for networking and multimedia there are applications with thousands of transactions in parallel, thousands of flows that you're processing, thousands of streams of multimedia-type requests, so [the application] has to be very parallel in nature to start with.
A lot of the angst against multicore/many-core in general is people think, "I've got my application that's single-threaded and I need to run it in a hundred threads." If you have an application that runs on a single thread, you have no hope of running it on a 100 cores. You need an application that's inherently parallel.
If you look at routers, services and switches, security boxes, media gateways — these are the top design wins, I'd say. And then on the multimedia side, there's videoconferencing through multipoint control units (MCUs), where you have many streams going on all at the same time.
Don't some people already run those tasks with application-specific integrated circuits (Asics) and field-programmable gate arrays (FPGAs), as they can be cheap and reasonably power efficient?
There is an option there. [But the choice is] "I have x86 [architecture] and I can develop software very easily, or I have Asics and FPGAs, which are not general purpose, will take a lot of time to develop, but they give me better energy, power."
What Tilera provides is flexibility and power at the same time.
What Tilera provides is flexibility and power at the same time. To give an example, the [yet-to-launch] GX3000 series will be equivalent to an [Intel] Sandy Bridge eight-core when it comes to video processing and it will be at 25W. Sandy Bridge does it around 130/150W. Also, it's still [programmable in] C and C++ — you don't have to do special programming or GPU programming, and you get the benefit of the lower power and space as well.
What applications exist that could make use of such a large number of cores?
I think the parallel applications are there in the embedded market and networking on multimedia. On the cloud side, the applications are already there. With the Facebooks, Googles, Zyngas, there are so many parallel applications and they need power efficiency, so that's where we fit.
Power is the biggest thing because [the big web 2.0] companies have optimised the rest out of it. The biggest chunk of power consumption is the processor now. That's the entry way for ARM into servers.
Is big data [the growing practice in the enterprise of pulling together and analysing large datasets from a variety of different sources] an area of opportunity for chips like this?
In general, when it comes to Web 2.0, it's very small tasks: you have a request for some data, you need to do a few analyses on it and send it back out. Very small tasks but thousands and thousands of them — that's the nature of Web 2.0 servers today.
If you think about Facebook or Microsoft, from a datacentre point of view, you'll see...
...that the majority of those servers are running five, six or seven applications. [Large web companies] have web applications upfront, [along with] in-memory databases, database and data mining.
When you open up your Facebook page, it goes to a web app server that has all the PHP on it and then it sends tens of hundreds of requests to other servers, each one handling a piece of it. It then aggregates all this data and sends it back to you. All of this happens in milliseconds.
To them, the place for Tilera or ARM is: "How can I make this server cheaper, lower power and even more disposable? If one fails, no problem; I can keep marching."
What are the distinguishing features of Tilera's iMesh architecture in terms of efficiency?
If you look at the bus, to get the bandwidth on there you have to make [the bus] very wide. Then you need to drive [your data] very fast and drive [it through] long wires. That gives you more power consumption than if you have a mesh of very short wires, point-to-point. iMesh is the wires, which is very cheap. Plus, you don't have to drive them very fast, because you have so many of them.
The other thing about iMesh is distribution. We have distributed everything on the chip. For example, we don't have the L3 cache [like Intel]. We have distributed the cache along the whole chip, so you don't have to light up a big area of the cache when you want to access it, as it's [made of] smaller pieces.
Think of yourself in New York and imagine there's only one big grocery store in Midtown, versus having a hundred different grocery stores spread throughout the city. Think of how much gas you will take to access that, versus the other.
Why hasn't Intel made a similar product?
To be able to put so many cores on a single chip that will be an x86-compliant core would be very power consumptive — you saw Larrabee, right? Larrabee was [Intel's attempt] to win in the graphics market, and they cancelled it because they were late.
Intel serves the whole market with almost one product. You have one core, one architecture — Sandy Bridge or Nehalem or Westmere.
Intel serves the whole market with almost one product. You have one core, one architecture — Sandy Bridge or Nehalem or Westmere or whatever. Take [that] core, put that by any cache, add QPI [Intel QuickPath Interconnect], make that chip and then it's a server chip. Take the QPI out, reduce the cache — now it's a desktop chip. Completely slash the cache out, now it's a Celeron chip.
But it's the same exact core with varying frequencies and maybe some features like virtualisation enabled or disabled. So for them to say, "We're going to abandon this tick-tock [development process] and now we're going to have a parallel architecture that will be a different core and we're going to do 50 cores on a chip and we'll target these markets and we will be competing with the same products that we're offering here..." — they'll have to see a very, very important reason to do that.
On the technical side, we've spent the last 15 years — not Tilera but through our researchers at MIT — to solve a lot of these problems, and we have a lot of the IP in the area. Intel's pretty smart, and I'm sure they can develop something on their own. But there are a lot of pitfalls they have to overcome to get to where we're at right now. We'll have some advantage over them.
Get the latest technology news and analysis, blogs and reviews
delivered directly to your inbox with ZDNet UK's