Nvidia outlines Tegra K1, 192-core chip to put supercomputing everywhere

Nvidia outlines Tegra K1, 192-core chip to put supercomputing everywhere

Summary: Nvidia's master plan is to take its supercomputing knowhow and meld it with mobile platforms as they converge in various markets.


Nvidia CEO Jen-Hsun Huang on Sunday outlined the Tegra K1, a 192 core "super chip" that aims to bridge the gap between mobile computing and supercomputing. The plan for Nvidia is to create an architecture that will play along with Android to disrupt industries ranging from gaming to automobiles and cloud computing.

Speaking at CES 2014 in Las Vegas, Huang said the Tegra K1 will meld supercomputing momentum---Nvidia has gained a lot of traction in supercomputing---and cloud game development on Android. 

Also see CNET's live blog of Nvidia press conference from CES

The gist of the Tegra K1 goes like this:

  • 192 Cuda cores that are parallel and programmable. 
  • It's based on Nvidia's Kepler architecture. 
  • A single architecture designed for computing on phones to supercomputers. 
  • Epic Games will bring its Unreal Engine 4 to Tegra K1 for game development. 
  • Platforms such as new game consoles such as the Xbox One and Sony PlayStation 4 are essentially PCs.
  • The Tegra K1 will also be in two versions. One is quad core A15 based on ARM. Another version will be available on Nvidia's Denver CPUs. Tegra K1 with Denver has been out a few days and Huang showed a demo. "When you've worked on a chip for 5 years and it has come back and it's not a brick you're so happy," said Huang. 
  • No word on availability, but Huang said that the outline of K1 wasn't a "PowerPoint launch."
  • Nvidia talked 64-bit computing years ago, but now has to play catch-up to Apple's A7 and Qualcomm. 
Credit for all photos: James Martin, CNET.


While gaming got a lot of play from Nvidia, the company also has wider ambitions with the Tegra K1. Tegra K1 is going to be Nvidia's entry into a bevy of new markets. Huang hinted that Nvidia plans to tag along with Android in any market the OS enters. Huang also outlined how Nvidia will be making a big push into automakers beyond just the design process. 

tegra k1 versions


giant leap story 2 slide


"It's simply a matter of time before Android starts to disrupt new markets," said Huang, who noted that Android will play a key role in 4K televisions, video game consoles and cars. "We happen to believe the car will be your most important mobile computer," said Huang.

If you add up Tegra K1 with other efforts, Nvidia sees itself as the enabler to bring photo realism and graphics heft to a bevy of industries. 

However, Huang outlined Nvidia's cloud efforts and how it is tackling issues like latency and synchronization in gaming. The technologies could also be used to synchronize streaming of applications in the enterprise.


Huang noted you can get PC games on your television via its game console Shield and a technology called GameStream, which streams games wirelessly. However, there have been latency issues that Nvidia has been looking to fix. Tests in the field by CNET have noted that Shield game streams can lag.

Nvidia did a demo showing a game stream connected to a server in France. "The GRID virtualized GPU server gamestreaming across the ocean in 30 milliseconds back the Shield device," said Huang.

Huang also outlined G-Sync, a technology that is designed to limit latency and lag in visual computing. G-Sync can deliver game frame rates as fast or as slow as needed because it synchronizes with the source of the application. G-Sync will be available in the second quarter from Nvidia.

G-Sync accomplishes its feat by adding more buffers for latency, updating a frame as soon as the GPU is ready and delivering frames at a variable rate.

Topics: Processors, Big Data, Cloud, Data Centers

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Latency is indeed an issue.

    Latency is indeed an issue.

    One that will improve over time, but never *quite* go away, due to the speed of light. If you plan on conferencing or gaming with somebody on the other side of the globe - even the speed of light with no overhead would be a noticeable delay (1/14th of a second each way, I believe - which is enough for a gamer to notice).

    Of course, that's across the globe; a worst case scenario (at least until we start colonizing other planets). If we're talking about in the living room or in the same city, it should be able to be made unnoticeable.

    There is a *lot* of room for improvement. So I do expect that over time things will get better.

    Latency is also why I question a "cloud only" approach to computing. Most people want their apps to be snappy and fast, and "cloud only" might never be able to deliver on that promise.

    Local resources always has less latency than the cloud - that's just the nature of things. Not sure why this fact slips past bloggers.

    So I do expect that for the foreseeable future, we won't ever be seeing truly "dumb" machines. Seriously - look at this - it's not transitioning to a "dumb display." It's got nearly 200 cores. You don't really need 200 cores for a "dumb display." If you're merely streaming video, this is overkill.

    But you *would* need it for a game that stores assets locally and renders them on the fly.

    So that's what I'm seeing here - I don't think they're seeing the future as "just stream the video of the game." I'm pretty sure they're looking into streaming the assets and allowing the device to render the game in real time.

    Local computing isn't dying - indeed, it's the future. Sure, "the cloud" will always play a role in the future as well (although I'd like a better name than "cloud," thanks), but it will be a future where they work together, not a future where one excludes the other.

    "Huang also outlined G-Sync . . ."

    Which, unfortunately, requires you to buy a new monitor.

    . . . and it doesn't get rid of network latency. It actually gets rid of another type of latency caused by dropping frames when the monitor and GPU are a bit mismatched in frame rate. The "source of the application" in this case isn't a mystery: It's the local GPU that's connected to the display.


    "G-Sync accomplishes its feat by adding more buffers for latency . . ."

    Eh, you're confusing G-Sync with triple buffering, a band-aid solution designed to deal with the problems G-Sync is meant to solve.

    Classic double/triple buffering use extra buffers in order to overcome the fact that the GPU can't control the refresh timing of the monitor.

    G-sync will probably require *less* buffers, because in this case the GPU controls when the monitor is refreshed. You could probably do it with only a single buffer, no need for double/triple buffering.

    ". . . updating a frame as soon as the GPU is ready and delivering frames at a variable rate."

    . . . but you got this part right. Instead of the monitor updating at a fixed rate, it simply waits for the video card to tell it a new frame is ready, eliminating the issues with screen tearing or frame drops, which should also eliminate the need for more buffers.
    • problems known for decades still are major issues

      All we are getting are a bunch of incompatible proprietary point solutions rather than the redesign of networking that's needed to address even the two fundamental issues of latency and security... maddening.
      • Networks designed for Parallel Processing exist

        Networks such as Myrinet exist which provide optimized networking for low-latency parallel processing. The problem isn't the networking technology as much as people wanting to distribute the processing now across wide area networks and the internet. Those types of networks are not designed for that purpose so any parallel processing that relies heavily on message passing between nodes will never translate to that type networking efficiently. For best performance, you need to keep as much of your message passing on the same network segment as possible.

        What some distributed computing solutions will do is pass data to individual clusters across a wan or the internet and the message passing required for processing will be contained in the cluster own network.
        • Network not the answer

          Myrinet is pretty much obsolete in the face of infiniband; neither makes much difference when the best supercomputing use of these is to load stuff once in-memory, then apply all the cores to the problem. This works IFF the CUDA cores can get at the main memory, rather than special purpose memory, as typical with GPUs hung on PCI-E.
        • Low latency is a very relative term in computing

          While these specialized networks are low latency relative to normal networks, most supercomputers do tasks that highly parallel and independent such that latency isn't as big a factor. They attempt to dodge the latency issue. Avoid hitting the network unless necessary.

          But that "low latency" is not that low when compared to memory access, for example. And while many tasks can be done in highly parallel tasks not all are.
          • Networking is not the answer

            Most of the latency is in the switching and conversions back and forth between optical and electronic transmissions. This will not be cured until on-chip silicon photonics make it possible to have optical chip-to-chip interconnects. At the same time, it will make all-optical switching cheap and feasible. That takes care of latecy, for the most part. But what is really needed is a bus, not a network. We need to be able to establish chip-to-chip interconnects at a distance, or in other words dedicated point-to-point interconnects. Think of it as wide-area PCI-E.
    • Latency *isn't* the issue...

      How many people here get a stable and uninterrupted data connection on their mobile phones these days? Who also gets a guaranteed connection to ther ISP?
      Latency isn't the killer here when most people don't even have a fiber connection to their homes. How are you meant to stream intensive applications from a remote source to your device when your connection is flaky at best? Most of us will know first hand that streaming live video has its issues today and that's even when several seconds are buffered into a cache on the local device - which you can't do with a live application.
      We are a long way from realtime cloud applications as described above. Until connectivity to the net improves on a global scale this isn't going to happen, even at that stage only people in big cities with excellent infrastructure and low contention will be able to use these services, so you can rule out most of the world's population!
  • Parallel processing just not there.

    With the exception of task specific computing (and very specialized software to go with it) I don't see parallel processing as being viable until well into the future.
    • Exception Noted...

      ... but that's a helluva market chunk these days, nonetheless.

      There's all those recordings we need to analyze.
    • NoAxToGrind stick to subject matter you know something about.

      I have years of supercomputing experience and yes some problems cannot be broken up such that Parallel processing is a viable solution but a lot can. Even if an individual solution is serial it often needs to be run on multiple sets of data which can be done in parallel. Parallel computing is far from the limited to 'task specific computing'.

      Multi core machines are still not fully utilized in the consumer computing do to the wealth of serial programming in existence but in Research and Development Parallel processing is king. It's used for Simulating Nuclear explosions, Mapping the Human Genome and Meteorology just to name a few fields. A Beowulf cluster was used to render Lord of the Rings.

      Parallel Processing is hardly the niche solution you are implying it is. But then again you always have to have your say don't you.
      • I know right!?


        Thank you for helping NoAxToGrind understand the everyday value of parallel processing. It's hard to believe that people still thing that it's such a niche solution.

        I often simulate nuclear explosions on my computer before I've even had my first cup of coffee in the morning. And usually my computer is cranking out weather patterns and mapping the genome between emails. Lately I find that I'm tired after a long day of these everyday tasks so I've been slacking in the evenings, but as soon as I get back on the horse you better believe I'm going to be rendering the next Hobbit movie.
      • As I said

        With the exception of task specific computing (and very specialized software to go with it)
    • Graphics is highly parallel and many math concepts like matrix manipulation

      That said, even your average PC takes advantage of multi-core as there are dozens of processes and hundreds of threads. The OS alone utilizes the cores (an examination of core loading will show that cores are fairly evenly used).

      The oft spoken myth about OSes and Applications not taking advantage of multi-core and hyperthreading is just that a myth. Using process examination tools will show you how many threads are running - its a lot on even the simplest system.
  • I don't give a crap if my smart phone has a 192 core processor

    I want that capability on my x86 desktop or laptop. I didn't see anything in the article that suggests that that will even be possible.
    • your kidding

      I have a quad core PC with 384 cuda cores. It doesn't feel like a supercomputer and transcoding software can't make use of all those cores. Not sure how swapping a Tegra arm suddenly makes it a supercomputer.
      • Depends...

        Your PC will never be a supercomputer by itself, but it could be a node in a supercomputer cluster.

        The CUDA cores in your GPU on your PC aren't on the main memory, which severely compromises their use except for very special applications. One of which, if I understand correctly, would have been mining Bitcoins a year or two ago.

        The CUDA cores will be more useful when on main memory.
    • Yeah well...

      I mean, it isn't like my R9-280x has 2048 of these Stream Processing Units or nothing!
  • What are you talking about?

    Not to be rude, but unless your are targeting a small population that shares the same jargon, the language used needs clarification.

    What is the meaning of "The plan for Nvidia is to create an architecture that will play along with Android to disrupt industries ranging from gaming to automobiles and cloud computing"?

    Why would Nvidia want to "disrupt" industries and why would this make them popular in any way? "Disrupt" has a lot of meanings, almost all of them pejorative; and all of them are hostile to some extent. I can't see that Nvidia would gain from being perceived as hostile.

    And what is this "bridge the gap between mobile computing and supercomputers"? The biggest slowdown on device is the connection between my device and who or what it is communicating with. Is this an attempt to improve communications? It doesn't sound like it.
    Or is this an attempt to make laptops faster (they could sure use that, but my desktops could use it, too). However, the problem with laptops has always been and will probably continue to be an issue of speed versus weight versus size versus price. Telling us that this device (chip?) is going to somehow have an impact on computation speed doesn't at all convey how it integrates in total: Will the price increase? Will it run hot and require more cooling? Will it require more space or more weight in the hardware?

    I'm sorry, but quoting a press release without translating/interpreting it and expanding upon it is poor journalism.
  • Why do I get the image of ,,,,,,,,,,,,

    a guy in a cheap suit and a toothy grin saying "This is going to cost yuh."
  • Does it mine Litecoin?

    I wonder when will Nvidia and AMD talk about mining rigs.