星期日, 8月 11, 2019

Convolution Theorem and Overlap Save/Add Method



https://github.com/chenhsiu/remagic/blob/master/convolution.ipynb

The Overlap add/save method gives us an idea about how to use FFT to accelerate convolution. This method is generally much faster than typical pair-wise multiplication convolution by its definition. But how many performance gain we can get from this kind of FFT accelerated convolution? Let's do some experiments on the 2D convolution and see its result.


Convolution Theorem

First of all, let's try prove the convolution theorem with the python code.

From time domain

In [32]:
import numpy as np
from scipy.fftpack import fft, ifft, fftn, ifftn

x = [0, 0, 3, -1, 0]
h = [1, 1, 2, 1, 1]

X = fft(x)
H = fft(h)
r1 = np.real(ifft(X*H))

r2 = np.convolve(np.hstack((x[1:],x)), h, mode='valid')
print('%s == %s ? %s' % (r1, r2, 'Yes' if np.allclose(r1, r2) else 'No'))
[1. 2. 2. 2. 5.] == [1 2 2 2 5] ? Yes

From frequency domain

In [31]:
# frequency domain
X = np.array([1.+1.j, 2.+3.j, 1.+2.j, 3.+2.j])
H = np.array([2.+3.j, 1.+1.j, 3.+3.j, 4.+5.j])

Y = X * H
r1 = ifft(Y)
print('r1 = %s' % r1)

x = ifft(X)
h = ifft(H)
x = np.hstack((x[1:], x)) # by default is linear convolution, make it circular
r2 = np.convolve(x,h, mode='valid')
print('r2 = %s' % r2)

print('r1 == r2 ? %s' % ('Yes' if np.allclose(r1, r2) else 'No'))
r1 = [-0.75+10.5j   5.   -1.75j -1.25 -3.5j  -4.   -0.25j]
r2 = [-0.75+10.5j   5.   -1.75j -1.25 -3.5j  -4.   -0.25j]
r1 == r2 ? Yes

Time domain with different size signal

In [16]:
x = [7, 2, 3, -1, 0, -3, 5, 6]
h = [1, 2, -1]

X = fft(x)
H = fft(h, len(x))
r1 = ifft(X * H)
print('r1 = %s' % r1)
r2 = np.real(np.convolve(np.hstack((x[1:],x)),h,mode='valid'))
r2 = r2[len(r2) - len(x):]
print('r2 = %s' % r2)

print('r1 == r2 ? %s' % ('Yes' if np.allclose(r1, r2) else 'No'))
r1 = [ 1.40000000e+01+0.0000000e+00j  1.00000000e+01-2.2977602e-16j
 -1.11022302e-15+0.0000000e+00j  3.00000000e+00+2.2977602e-16j
 -5.00000000e+00+0.0000000e+00j -2.00000000e+00+6.5840240e-16j
 -1.00000000e+00+0.0000000e+00j  1.90000000e+01-6.5840240e-16j]
r2 = [14 10  0  3 -5 -2 -1 19]
r1 == r2 ? Yes

Fast Convolution with FFT

Now we see how the FFT can help us on fast convolution.
The convolve2d in scipy.signal uses pair-wised multiplication (see the source code ). Meanwhile, there is also a fftconvolve in scipy.signal which uses FFT to calculate convolution (see source code here). From its documentation:
This is generally much faster than convolve for large arrays (n > ~500), but can be slower when only a few output values are needed, and can only output float arrays (int or object array inputs will be cast to float).
For overlapadd2, we found a 2D overlap-add with FFT implementation on github. Below is its description:
Fast two-dimensional linear convolution via the overlap-add method. The overlap-add method is well-suited to convolving a very large array, Amat, with a much smaller filter array, Hmat by breaking the large convolution into many smaller L-sized sub-convolutions, and evaluating these using the FFT. The computational savings over the straightforward two-dimensional convolution via, say, scipy.signal.convolve2d, can be substantial for large Amat and/or Hmat.
The performance comparison result on my NB shows below:
methodconvolve2dfftconvolveoverlapadd2
speed3040 ms30.1 ms94.8 ms
We can see the FFT based convolution is generally much faster than typical convolution, from 30X to 100X acceleration. The result surprises me a bit because fftconvolve is still faster than overlapadd2. The overlapadd2 looks good and ideal, but the user still needs to tweak the size of L in order to get the best performance. Maybe overlapadd2 has the real benefits only when the input matrix is so big that can't be fit into memory and we have split into sub-convolutions.
One thing to note is, when we use FFT convolution on image processing, there will be a dark borders around the image, due to the zero-padding beyond its boundaries. The convolve2d function allows for other types of image boundaries, but is far slower.

Reference

There is an article doing 2D convolution benchmark with various convolution libraries:
In [75]:
# before you run, eval the cell containing overlapadd2 at the end

from scipy import misc
import scipy.signal as sp
import matplotlib.pyplot as plt

A = misc.ascent()
A = A.astype(float)
print(A.shape)
H = np.outer(sp.gaussian(64, 8), sp.gaussian(64, 8))

print('==> using convolve2d')
%time B1 = sp.convolve2d(A, H, mode='same')
print('==> using fftconvolve')
%time B2 = sp.fftconvolve(A, H, mode='same')
print('==> using overlapadd2')
%time B3 = overlapadd2(A, H)

fig, (ax_orig, ax_conv, ax_fft2conv, ax_ovadd2) = plt.subplots(1, 4, figsize = (12, 8))
ax_orig.imshow(A, cmap='gray')
ax_orig.set_title('Original')
ax_orig.set_axis_off()
ax_conv.imshow(B1, cmap='gray')
ax_conv.set_title('convolve2d')
ax_conv.set_axis_off()
ax_fft2conv.imshow(B2, cmap='gray')
ax_fft2conv.set_title('fftconvolve')
ax_fft2conv.set_axis_off()
ax_ovadd2.imshow(B3, cmap='gray')
ax_ovadd2.set_title('overlapadd2')
ax_ovadd2.set_axis_off()
fig.show()
(512, 512)
==> using convolve2d
CPU times: user 2.84 s, sys: 4.87 ms, total: 2.84 s
Wall time: 2.84 s
==> using fftconvolve
CPU times: user 28.4 ms, sys: 11 µs, total: 28.4 ms
Wall time: 28.8 ms
==> using overlapadd2
L = [288 288]
Nfft = [360 360]
CPU times: user 54 ms, sys: 1.89 ms, total: 55.9 ms
Wall time: 56.1 ms
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:32: UserWarning: Matplotlib is currently using module://ipykernel.pylab.backend_inline, which is a non-GUI backend, so cannot show the figure.

lec-1 (2022-05-12) Accelerating deep learning computation & strategies

雖然用 DNN train/predict model 也好一陣子了,但這週才是第一次搞懂 cuDNN 是作什麼的 以前好奇過 tensorflow/pytorch 是怎麼做 convolution 的,FFT 不是比較好嗎? 下面的 reference 就給了很好的解釋: Wh...