In my recent post about Neural Processors I noted that just about everyone who can - Google, Apple, IBM, Intel & more - has built neural processors to accelerate neural networks. They are mostly deployed as co-processors to run AI models, but as demand for intelligent applications grows, systems will have to do more to adapt.
Why? Because AI systems have unique I/O requirements. That's why the neural processors don't have caches, nor do they need floating point numbers - and their I/O overhead.
In the common application of computer vision, moving many frames of high-res video also stresses the I/O subsystem. Recurrent neural networks focus on streaming data, another bandwidth intensive application.
We're able to limp along today because this is AI's early days - much as 8 bit processors in 70s PCs worked fine - and it's only as capacity and performance requirements grow that system architectures will have to change.
Architects and computer scientists are still learning how to optimize data structures and data representation for performance. Even so, it is painfully clear that standard x86 architectures will never become preferred AI platforms.
For example, as noted in Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interfaces, today's Deep Neural Networks (DNN)
. . . assume the availability of many narrow read and write ports (8 or 16 bits), each with independent DRAM access. . . . As such, a memory interconnect must be used to multiplex the wide DRAM controller interface to a large number of narrow read and write ports, while maintaining maximum bandwidth efficiency.
As a result, unlike memory busses today, a DNN is most efficient when DRAM bandwidth is evenly partitioned across the DRAM ports. Which means that other logic, designed for wide memory busses, such as multiplexers, is not required either.
SEE: Managing AI and ML in the enterprise (ZDNet special report) | Download the report as a PDF (TechRepublic)
Since DRAM can account for as much as 90 percent of energy consumption, minimizing memory logic, and using memory efficiently, can be a major cost saving for mobile devices - or for a warehouse-scale computer.
Memory accesses aren't the only, or even always the most important, difference between traditional and AI workloads. But there is no doubt that as AI applications grow in sophistication, current architectures - x86 and ARM - will be less and less relevant.
The Storage Bits take
In the next post, I'll discuss further the implications that the widespread use of AI applications will have on CPU and server architectures. Suffice it to say that if AI applications become widespread - and I believe they will - a new generation of CPUs will be required to run them efficiently and quickly.
Courteous comments welcome, of course.