Chris Huang's Blog: 9月 2019

星期日, 9月 22, 2019

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

這是一種 DNN model reduction (compression) 的方法:第一個stage 將原有model 中
weight 小於某個threshold 的connection 去除(減斷),並進行重新的training ( 以確保
error rate 沒有增加);第二個 stage 將 network 中每一層的weights 做分群並將各群的中
心(or 平均值)做為 code book 來表示每一層的 weights ( 此步驟很像 vector
quantization ); 第三個 stage 則是依據 code book 中 code words 出現的機率大小,以
Huffman code 壓缩之。
大部分的 DNN 可壓個 20 倍,執行速度也比較快!(35x to 49x compression ratio was
reported in literature, as expected, this approach is very time and computing
resources consuming in the training phase)。

https://arxiv.org/abs/1510.00149

【深度神经网络压缩】Deep Compression （ICLR2016 Best Paper）
https://zhuanlan.zhihu.com/p/21574328

Neural networks are both computationally intensive and memory intensive, making
them difficult to deploy on embedded systems with limited hardware resources. To
address this limitation, we introduce “deep compression”, a three stage pipeline:
pruning, trained quantization and Huffman coding, that work together to reduce
the storage requirement of neural networks by 35× to 49× without affecting their
accuracy. Our method first prunes the network by learning only the important
connections. Next, we quantize the weights to enforce weight sharing, finally, we
apply Huffman coding. After the first two steps we retrain the network to fine
tune the remaining connections and the quantized centroids. Pruning, reduces the
number of connections by 9× to 13×; Quantization then reduces the number of
bits that represent each connection from 32 to 5. On the ImageNet dataset, our
method reduced the storage required by AlexNet by 35×, from 240MB to 6.9MB,
without loss of accuracy. Our method reduced the size of VGG-16 by 49× from
552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model
into on-chip SRAM cache rather than off-chip DRAM memory. Our compression
method also facilitates the use of complex neural networks in mobile applications
where application size and download bandwidth are constrained. Benchmarked on
CPU, GPU and mobile GPU, compressed network has 3× to 4× layerwise speedup
and 3× to 7× better energy efficiency.

星期五, 9月 20, 2019

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework

The marriage of big data and deep learning leads to the great success of artificial intelligence, but it also raises new challenges in data communication, storage and computation [7] incurred by the growing amount of distributed data and the increasing DNN model size. For resource-constrained IoT applications, while recent researches have been conducted [8, 9] to handle the computation and memory-intensive DNN workloads in an energy efficient manner, there lack efficient solutions to reduce the power-hungry data offloading and storage on terminal devices like edge sensors, especially in face of the stringent constraints on communication bandwidth, energy and hardware resources. Recent studies show that the latencies to upload a JPEG-compressed input image (i.e. 152KB) for a single inference of a popular CNN–“AlexNet” via stable wireless connections with 3G (870ms), LTE (180ms) and Wi-Fi (95ms), can exceed that of DNN computation (6∼82ms) by a mobile or cloud-GPU [10]. Moreover, the communication energy is comparable with the associated DNN computation energy.

Existing image compression frameworks (such as JPEG) can compress data aggressively, but they are often optimized for the Human-Visual System (HVS) or human’s perceived image quality, which can lead to unacceptable DNN accuracy degradation at higher compression ratios (CR) and thus significantly harm the quality of intelligent services. As shown later, testing a well-trained AlexNet using CR =∼ 5× compressed JPEG images (w.r.t. CR = 1× high quality images ), can lead to ∼ 9% image recognition accuracy reduction for the large scale dataset— ImageNet, almost offsetting the improvement brought by more complex DNN topology, i.e. from AlexNet to GoogLeNet (8 layers, 724M MACs v.s. 22 layers, 1.43G MACs) [11, 12]. This prompts the need of developing an DNN-favorable deep compression framework.

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework
https://arxiv.org/pdf/1803.05788.pdf

Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples
https://arxiv.org/pdf/1803.05787.pdf