星期四, 5月 07, 2020

Efficient Computing for Deep Learning, Robotics, and AI

Although the deep learning technology has significant progress in recent years, one of the key differences between the human brain and computer is the energy efficiency. Open AI shows in the past few years the amount of computing resources needed has grown exponentially over 300 thousand times. From the environmental perspective, the carbon footprint of the neural architecture search is actually in an order of magnitude larger than a round-trip flight between New York and San Francisco. Not mentioning the deep learning in the cloud, even for the machine learning in the edge computing, it is reported that a self-driving car prototype uses approximately 2,500 watts of computing power just to process all the sensor data. In the past decades, we're relying on the development of semi-conductor technology to give us smaller faster, and more energy-efficient chips. However, the Moore's Law and Dennard Scaling have been slow down in recent years, and we see the transistors are not becoming more efficient, to drive the deep neural network applications. When we do autonomous navigation, the start-of-the-art approaches to do depth estimation and object segmentation use deep neural networks, which requires up to several hundred millions of operations and weights to compute, that is 100 times more complex than video compression. Therefore, we need to have specialized hardware for significant improvement in the speed and energy efficiency, that is, redesign the computing hardware from the ground up. If we look inside the neural network process, it is found that data movement is the most expensive, consuming the most energy compared to a floating-point multiplication. Techniques like weight pruning or efficient architecture redesign do reduce the number of MACs and weight, but from the system perspective, it doesn't mean energy-saving or reduce latency. The speaker and her team developed an Eyeriss Deep Neural Network Accelerator with an on-chip buffer to achieve over 10 times energy reduction compared to a mobile GPU. 




沒有留言:

lec-1 (2022-05-12) Accelerating deep learning computation & strategies

雖然用 DNN train/predict model 也好一陣子了,但這週才是第一次搞懂 cuDNN 是作什麼的 以前好奇過 tensorflow/pytorch 是怎麼做 convolution 的,FFT 不是比較好嗎? 下面的 reference 就給了很好的解釋: Wh...