Nervana Engine Delivers Deep Learning at Ludicrous Speed!

Home / Articles / External Non-Government


February 16, 2018 | Originally published by Date Line: February 16 on

Nervana is currently developing the Nervana Engine, an application specific integrated circuit (ASIC) that is custom-designed and optimized for deep learning.

Training a deep neural network involves many compute-intensive operations, including matrix multiplication of tensors and convolution. Graphics processing units (GPUs) are more well-suited to these operations than CPUs since GPUs were originally designed for video games in which the movement of on-screen objects is governed by vectors and linear algebra. As a result, GPUs have become the go-to computing platform for deep learning. But there is much room for improvement — because the numeric precision, control logic, caches, and other architectural elements of GPUs were optimized for video games, not deep learning.

As authors of the world’s fastest GPU kernels for deep learning, Nervana understands these limitations better than anyone, and knows how to address them most effectively. When designing the Nervana Engine, we threw out the GPU paradigm and started fresh. We analyzed the most popular deep neural networks and determined the best architecture for their key operations. We even analyzed and optimized our core numerical format and created FlexPoint™, which maximizes the precision that can be stored within 16 bits, enabling the perfect combination of high memory bandwidth and algorithmic performance. Then we added enough flexibility to ensure that our architecture is “future proof.” The Nervana Engine includes everything needed for deep learning and nothing more, ensuring that Nervana will remain the world’s fastest deep learning platform. So … are you ready for deep learning at ludicrous speed?!