星期五, 5月 13, 2022

lec-1 (2022-05-12) Accelerating deep learning computation & strategies

雖然用 DNN train/predict model 也好一陣子了,但這週才是第一次搞懂 cuDNN 是作什麼的

以前好奇過 tensorflow/pytorch 是怎麼做 convolution 的,FFT 不是比較好嗎?

下面的 reference 就給了很好的解釋:

Why GEMM is at the heart of deep learning

Why GEMM works for Convolutions

Hopefully you can now see how you can express a convolutional layer as a matrix multiplication, but it’s still not obvious why you would do it. The short answer is that it turns out that the Fortran world of scientific programmers has spent decades optimizing code to perform large matrix to matrix multiplications, and the benefits from the very regular patterns of memory access outweigh the wasteful storage costs. This paper from Nvidia is a good introduction to some of the different approaches you can use, but they also describe why they ended up with a modified version of GEMM as their favored approach. There are also a lot of advantages to being able to batch up a lot of input images against the same kernels at once, and this paper on Caffe con troll uses those to very good effect. The main competitor to the GEMM approach is using Fourier transforms to do the operation in frequency space, but the use of strides in our convolutions makes it hard to be as efficient.

The good news is that having a single, well-understood function taking up most of our time gives a very clear path to optimizing for speed and power usage, both with better software implementations and by tailoring the hardware to run the operation well. 

然後也才懂為什麼在網路架構不改變的狀況下,設定

torch.backends.cudnn.benchmark = True

在我的電腦上可以快個 15%

因為很多問題可以使用 AI 運算解決,而 GEMM 是 AI 效能的關鍵,半導體的進步可以在 AI 加速器上繼續演進,克服 Amdahl's Law。解決下面這張圖紅色部份這二十年來的困境

所以 John Hennessy and David Patterson 才會說: A New Golden Age for Computer Architecture

lecture

paper


星期四, 11月 26, 2020

學霸與期中考

 一早拉著行李到學校,考完期中考,然後再拖著行李去松山機場。

最近有朋友問我,離開學校這麼久,面對考試會不會不適應?我還真沒想過這個問題。不過今年來歷經四次考試,好像還真有那麼一點過往的經驗逐漸浮現,只是都忘了,果然人是會傾向忘記考試這種讓人痛苦的回憶啊!

之前上助教研習課的時候,發現助教們的共同話題是:『台大的學生很愛分數,為了那一點分數拼命跟你吵!』,然後上課時聽到學生們的話題是:『助教太機車了,這樣也要扣我分,我一定要去爭取!』。立場不同會有不同的思維,果然不管在哪裡都一樣。只是不曉得當過助教的學生會不會因此心態比較不一樣?

當然遺忘歸遺忘,多經歷幾次,考試的技巧還是知道的,但技巧只是為了好看的分數,實際上,沒有別人會注意你的成績,在意你成績的只有你自己。考高分的人只是謙虛,他們其實都很拼,理解後讀三遍加背誦,連考試技巧都滿點,所以拿滿分也是理所當然。

人生好難,十一月好忙,考完空降台東去渡假,好玩。

某位台大女學生貼出來的 open book 小抄
某位台大女學生貼出來的 open book 小抄

星期六, 7月 11, 2020

On the importance of democratizing Artificial Intelligence

Internet: 1995 => 2000 => 2015

In 1995, the Internet had had little impact on society. If you went around telling people that the Internet was about to change the world, you would have been with met a lot of skepticism. Most people didn't see how this new thing was relevant to them, and they didn't think that "normal" people would ever find value in the Internet. It's the same today with AI. But things are about to change, on an even larger scale.
Think about it this way: when the Internet became mainstream, a lot of business models were disrupted and a lot of companies had to transform themselves or disappear, such as bookstores, retailers, DVD sellers. And the same is about to happen with AI in the next few decades, except for all business models. For all jobs. AI is not going to be a new industry. AI is going to be in every industry. It's going to be in every application, in every process in our society, in every aspect of our lives. Not just business and jobs, but also culture and art. Everything. AI is going to change what it means to be human.
Mobile/Smart Phone: 2008 => 2013 => 2028
AI: 2015 => 2020 => 2035

So a lot of the occupations that exist in the world today are going to disappear, because we are going to automate nearly all current jobs. That's the nature of AI: automating an ever-growing range of intellectual tasks. And at the same time, a new world of opportunities is going to open up. A more exciting and much broader world of opportunities. We'll free up people's time to do more meaningful things. We'll transition to a different economy altogether --and it will be for the better. We will enter a new era of prosperity.
The Internet has been a huge step forward for humanity as a whole and for each and every one of us. By automating a wide variety of intellectual tasks, AI has the potential to be just as beneficial, if not more. Now is the time to make sure that the transition into this brave new world goes as smoothly as the previous one. Everyone should be able to start using AI to solve their business problems, to answer the questions they have, in much the same way that every business today has a webpage and can leverage the Internet for everything from sales to marketing to inventory sourcing.

星期日, 5月 24, 2020

10 Fears about the Future of the DBMS Field

Fear #1 - The hollow middle - the number of papers in SIGMOD deal with the core database system stuff is slowly declining in recent years, from 100% in 1977 to 47% in 2017. The conclusion is, we are drifting into "applications" - all the core becomes increasingly well understood. But the applications (NLP, autonomous vehicles, complex analytics) don't have anything to do with each other. We are moving to a world where nothing binds us together except 200 researchers looking favorably on each others' papers, from bifurcation to multi-furation!
Fear #2 - We have been abandoned by our customers. The industry types have largely disappeared, presumably because DBMS research conferences are no longer relevant to their needs. We have become disconnected from the real world.
Fear #3 - Diarrhea of papers. A Ph.D. applicant who wants to get a good academic job should have ~10 papers. A tenure case should have ~40 papers. And there are around 10X the number of researchers, i.e. 100X the number of papers. Nobody can keep up with this deluge. Everybody divides their papers into least publishable units (LPU's). A student has to grind out ~10 papers in 3-4 years. Any serious implementation is impossible in this climate. Postgre would be impossible in this climate. Way too much work for too few publications.
Fear #4 - Reviewing is getting very random. Quality stinks, huge program committees, the overwhelming crush of papers. Leads to revise and resubmit (R&R) mentality. Which is killing both the submitters and the reviewers.
Fear #5 - Research taste has disappeared (customer leaving and whale replacement) - the acceptance and subsequent rejection of MapReduce from Google. We are way too uncritical of the whales. People who don't understand history will be condemned to repeat it.
Fear #6 - We are polishing a round ball - 10% papers are not worth reading, and implementation.
Fear #7 - Irrelevant theory is taking over. It's very difficult to get system papers accepted. Seemingly because they have no theory. Real world doesn't share data. So do the whales. Systems papers are becoming more artificial.
Fear #8 - We are ignoring the most important problems. In favor of ones that are easy to solve. Driven by the need for quickies. A consequence of being detached from the real world. A field is defined to have "lost its way" when it forgets who the customer is.
Fear #9 - Research support is disappearing. NSF success rate down to 7%, and the mouths to feed is increasing. Industry is not picking up the slack. Without a healthy, aggressive, and innovative research community, we are "toast" in the long run.
Fear #10 - Student load. CS departments are overrun with students. 40%+ of MIT undergraduates are majoring in CS! Will likely make academic life less and less attractive of into the future. Best people will depart for greener pastures and a lot more money.



星期四, 5月 07, 2020

Efficient Computing for Deep Learning, Robotics, and AI

Although the deep learning technology has significant progress in recent years, one of the key differences between the human brain and computer is the energy efficiency. Open AI shows in the past few years the amount of computing resources needed has grown exponentially over 300 thousand times. From the environmental perspective, the carbon footprint of the neural architecture search is actually in an order of magnitude larger than a round-trip flight between New York and San Francisco. Not mentioning the deep learning in the cloud, even for the machine learning in the edge computing, it is reported that a self-driving car prototype uses approximately 2,500 watts of computing power just to process all the sensor data. In the past decades, we're relying on the development of semi-conductor technology to give us smaller faster, and more energy-efficient chips. However, the Moore's Law and Dennard Scaling have been slow down in recent years, and we see the transistors are not becoming more efficient, to drive the deep neural network applications. When we do autonomous navigation, the start-of-the-art approaches to do depth estimation and object segmentation use deep neural networks, which requires up to several hundred millions of operations and weights to compute, that is 100 times more complex than video compression. Therefore, we need to have specialized hardware for significant improvement in the speed and energy efficiency, that is, redesign the computing hardware from the ground up. If we look inside the neural network process, it is found that data movement is the most expensive, consuming the most energy compared to a floating-point multiplication. Techniques like weight pruning or efficient architecture redesign do reduce the number of MACs and weight, but from the system perspective, it doesn't mean energy-saving or reduce latency. The speaker and her team developed an Eyeriss Deep Neural Network Accelerator with an on-chip buffer to achieve over 10 times energy reduction compared to a mobile GPU. 




星期一, 12月 02, 2019

IQA - Image Quality Accessment



https://zhuanlan.zhihu.com/p/32553977




图像质量评价的数据库很多,各种失真类型针对各种图像,但公认度最高的还是前四个,即LIVE, CSIQ, TID2008和TID2013,这些库都提供了每幅失真图像的主观评分值MOS,也就是ground truth。原始图像数量都差不多,前两个库都是针对常见失真类型为主,即加性高斯白噪声、高斯模糊、JPEG压缩和JPEG2000压缩,而TID2013包含失真图像数量有3000幅,主观实验打分人数是917人,权威性当然是最高的,但由于失真类型数量高达25种,同时也是最难的。LIVE和CSIQ两个库可以做到很高了,目前FR IQA的主要战场是TID的两个库(BIQA还是只能在LIVE和CSIQ上玩玩,TID就惨不忍睹了)。下面两个表分别是数据库大致情况,和分别包括的失真类型,这里提到失真类型,是因为后面要用到。
https://xialeiliu.github.io/RankIQA/

https://live.ece.utexas.edu/research/quality/subjective.htm

H.R. Sheikh, M.F. Sabir and A.C. Bovik, "A statistical evaluation of recent full reference image quality assessment algorithms", IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440-3451, Nov. 2006.

SSIM

Z. Wang, A.C. Bovik, H.R. Sheikh and E.P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on Image Processing , vol.13, no.4, pp. 600- 612, April 2004.

星期日, 9月 22, 2019

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding


這是一種 DNN model reduction (compression) 的方法:第一個stage 將原有model 中
weight 小於某個threshold 的connection 去除(減斷),並進行重新的training ( 以確保
error rate 沒有增加);第二個 stage 將 network 中每一層的weights 做分群並將各群的中
心(or 平均值)做為 code book 來表示每一層的 weights ( 此步驟 很像 vector
quantization ); 第三個 stage 則是依據 code book 中 code words 出現的機率大小,以
Huffman code 壓缩之。
大部分的 DNN 可壓個 20 倍,執行速度也比較快!(35x to 49x compression ratio was
reported in literature, as expected, this approach is very time and computing
resources consuming in the training phase)。

https://arxiv.org/abs/1510.00149

【深度神经网络压缩】Deep Compression (ICLR2016 Best Paper)
https://zhuanlan.zhihu.com/p/21574328


Neural networks are both computationally intensive and memory intensive, making
them difficult to deploy on embedded systems with limited hardware resources. To
address this limitation, we introduce “deep compression”, a three stage pipeline:
pruning, trained quantization and Huffman coding, that work together to reduce
the storage requirement of neural networks by 35× to 49× without affecting their
accuracy. Our method first prunes the network by learning only the important
connections. Next, we quantize the weights to enforce weight sharing, finally, we
apply Huffman coding. After the first two steps we retrain the network to fine
tune the remaining connections and the quantized centroids. Pruning, reduces the
number of connections by 9× to 13×; Quantization then reduces the number of
bits that represent each connection from 32 to 5. On the ImageNet dataset, our
method reduced the storage required by AlexNet by 35×, from 240MB to 6.9MB,
without loss of accuracy. Our method reduced the size of VGG-16 by 49× from
552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model
into on-chip SRAM cache rather than off-chip DRAM memory. Our compression
method also facilitates the use of complex neural networks in mobile applications
where application size and download bandwidth are constrained. Benchmarked on
CPU, GPU and mobile GPU, compressed network has 3× to 4× layerwise speedup
and 3× to 7× better energy efficiency.


星期五, 9月 20, 2019

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework


The marriage of big data and deep learning leads to the great success of artificial intelligence, but it also raises new challenges in data communication, storage and computation [7] incurred by the growing amount of distributed data and the increasing DNN model size. For resource-constrained IoT applications, while recent researches have been conducted [8, 9] to handle the computation and memory-intensive DNN workloads in an energy efficient manner, there lack efficient solutions to reduce the power-hungry data offloading and storage on terminal devices like edge sensors, especially in face of the stringent constraints on communication bandwidth, energy and hardware resources. Recent studies show that the latencies to upload a JPEG-compressed input image (i.e. 152KB) for a single inference of a popular CNN–“AlexNet” via stable wireless connections with 3G (870ms), LTE (180ms) and Wi-Fi (95ms), can exceed that of DNN computation (6∼82ms) by a mobile or cloud-GPU [10]. Moreover, the communication energy is comparable with the associated DNN computation energy.

Existing image compression frameworks (such as JPEG) can compress data aggressively, but they are often optimized for the Human-Visual System (HVS) or human’s perceived image quality, which can lead to unacceptable DNN accuracy degradation at higher compression ratios (CR) and thus significantly harm the quality of intelligent services. As shown later, testing a well-trained AlexNet using CR =∼ 5× compressed JPEG images (w.r.t. CR = 1× high quality images ), can lead to ∼ 9% image recognition accuracy reduction for the large scale dataset— ImageNet, almost offsetting the improvement brought by more complex DNN topology, i.e. from AlexNet to GoogLeNet (8 layers, 724M MACs v.s. 22 layers, 1.43G MACs) [11, 12]. This prompts the need of developing an DNN-favorable deep compression framework.

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework
https://arxiv.org/pdf/1803.05788.pdf


Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples
https://arxiv.org/pdf/1803.05787.pdf

Data Science and Engineering


Data Science : Challenges and Directions
Prof. Longbing Cao, Communications ACM, Aug. 2017
http://203.170.84.89/~idawis33/DataScienceLab/publication/DS_CACM.pdf





lec-1 (2022-05-12) Accelerating deep learning computation & strategies

雖然用 DNN train/predict model 也好一陣子了,但這週才是第一次搞懂 cuDNN 是作什麼的 以前好奇過 tensorflow/pytorch 是怎麼做 convolution 的,FFT 不是比較好嗎? 下面的 reference 就給了很好的解釋: Wh...