首页 > 代码库 > Convolution Neural Network (CNN) 原理与实现

Convolution Neural Network (CNN) 原理与实现

本文结合Deep learning的一个应用,Convolution Neural Network 进行一些基本应用,参考Lecun的Document 0.1进行部分拓展,与结果展示(in python)。

分为以下几部分:

1. Convolution(卷积)

2. Pooling(降采样过程)

3. CNN结构

4.  跑实验

下面分别介绍。


PS:本篇blog为ese机器学习短期班参考资料(20140516课程),本文只是简要讲最naive最simple的思想,重在实践部分,原理课上详述。


1. Convolution(卷积)

类似于高斯卷积,对imagebatch中的所有image进行卷积。对于一张图,其所有feature map用一个filter卷成一张feature map。 如下面的代码,对一个imagebatch(含两张图)进行操作,每个图初始有3张feature map(R,G,B), 用两个9*9的filter进行卷积,结果是,每张图得到两个feature map。

卷积操作由theano的conv.conv2d实现,这里我们用随机参数W,b。结果有点像edge detector是不是?

Code: (详见注释)


# -*- coding: utf-8 -*-
"""
Created on Sat May 10 18:55:26 2014

@author: rachel

Function: convolution option of two pictures with same size (width,height)
input: 3 feature maps (3 channels <RGB> of a picture)
convolution: two 9*9 convolutional filters
"""

from theano.tensor.nnet import conv
import theano.tensor as T
import numpy, theano


rng = numpy.random.RandomState(23455)

# symbol variable
input = T.tensor4(name = ‘input‘)

# initial weights
w_shape = (2,3,9,9) #2 convolutional filters, 3 channels, filter shape: 9*9
w_bound = numpy.sqrt(3*9*9)
W = theano.shared(numpy.asarray(rng.uniform(low = -1.0/w_bound, high = 1.0/w_bound,size = w_shape),
                                dtype = input.dtype),name = ‘W‘)

b_shape = (2,)
b = theano.shared(numpy.asarray(rng.uniform(low = -.5, high = .5, size = b_shape),
                                dtype = input.dtype),name = ‘b‘)
                                
conv_out = conv.conv2d(input,W)

#T.TensorVariable.dimshuffle() can reshape or broadcast (add dimension)
#dimshuffle(self,*pattern)
# >>>b1 = b.dimshuffle(‘x‘,0,‘x‘,‘x‘)
# >>>b1.shape.eval()
# array([1,2,1,1])
output = T.nnet.sigmoid(conv_out + b.dimshuffle(‘x‘,0,‘x‘,‘x‘))
f = theano.function([input],output)





# demo
import pylab
from PIL import Image
#minibatch_img = T.tensor4(name = ‘minibatch_img‘)

#-------------img1---------------
img1 = Image.open(open(‘//home//rachel//Documents//ZJU_Projects//DL//Dataset//rachel.jpg‘))
width1,height1 = img1.size
img1 = numpy.asarray(img1, dtype = ‘float32‘)/256. # (height, width, 3)

# put image in 4D tensor of shape (1,3,height,width)
img1_rgb = img1.swapaxes(0,2).swapaxes(1,2).reshape(1,3,height1,width1) #(3,height,width)


#-------------img2---------------
img2 = Image.open(open(‘//home//rachel//Documents//ZJU_Projects//DL//Dataset//rachel1.jpg‘))
width2,height2 = img2.size
img2 = numpy.asarray(img2,dtype = ‘float32‘)/256.
img2_rgb = img2.swapaxes(0,2).swapaxes(1,2).reshape(1,3,height2,width2) #(3,height,width)



#minibatch_img = T.join(0,img1_rgb,img2_rgb)
minibatch_img = numpy.concatenate((img1_rgb,img2_rgb),axis = 0)
filtered_img = f(minibatch_img)


# plot original image and two convoluted results
pylab.subplot(2,3,1);pylab.axis(‘off‘);
pylab.imshow(img1)

pylab.subplot(2,3,4);pylab.axis(‘off‘);
pylab.imshow(img2)

pylab.gray()
pylab.subplot(2,3,2); pylab.axis("off")
pylab.imshow(filtered_img[0,0,:,:]) #0:minibatch_index; 0:1-st filter

pylab.subplot(2,3,3); pylab.axis("off")
pylab.imshow(filtered_img[0,1,:,:]) #0:minibatch_index; 1:1-st filter

pylab.subplot(2,3,5); pylab.axis("off")
pylab.imshow(filtered_img[1,0,:,:]) #0:minibatch_index; 0:1-st filter

pylab.subplot(2,3,6); pylab.axis("off")
pylab.imshow(filtered_img[1,1,:,:]) #0:minibatch_index; 1:1-st filter
pylab.show()







2. Pooling(降采样过程)


最常用的Maxpooling. 解决了两个问题:

1. 减少计算量

2. 旋转不变性 (原因自己悟)

     PS:对于旋转不变性,回忆下SIFT,LBP:采用主方向;HOG:选择不同方向的模版

Maxpooling的降采样过程会将feature map的长宽各减半。(下面结果图中没有体现出来,python自动给拉到一样大了,但实际上像素数是减半的)


Code: (详见注释)


# -*- coding: utf-8 -*-
"""
Created on Sat May 10 18:55:26 2014

@author: rachel

Function: convolution option 
input: 3 feature maps (3 channels <RGB> of a picture)
convolution: two 9*9 convolutional filters
"""

from theano.tensor.nnet import conv
import theano.tensor as T
import numpy, theano


rng = numpy.random.RandomState(23455)

# symbol variable
input = T.tensor4(name = ‘input‘)

# initial weights
w_shape = (2,3,9,9) #2 convolutional filters, 3 channels, filter shape: 9*9
w_bound = numpy.sqrt(3*9*9)
W = theano.shared(numpy.asarray(rng.uniform(low = -1.0/w_bound, high = 1.0/w_bound,size = w_shape),
                                dtype = input.dtype),name = ‘W‘)

b_shape = (2,)
b = theano.shared(numpy.asarray(rng.uniform(low = -.5, high = .5, size = b_shape),
                                dtype = input.dtype),name = ‘b‘)
                                
conv_out = conv.conv2d(input,W)

#T.TensorVariable.dimshuffle() can reshape or broadcast (add dimension)
#dimshuffle(self,*pattern)
# >>>b1 = b.dimshuffle(‘x‘,0,‘x‘,‘x‘)
# >>>b1.shape.eval()
# array([1,2,1,1])
output = T.nnet.sigmoid(conv_out + b.dimshuffle(‘x‘,0,‘x‘,‘x‘))
f = theano.function([input],output)





# demo
import pylab
from PIL import Image
from matplotlib.pyplot import *

#open random image
img = Image.open(open(‘//home//rachel//Documents//ZJU_Projects//DL//Dataset//rachel.jpg‘))
width,height = img.size
img = numpy.asarray(img, dtype = ‘float32‘)/256. # (height, width, 3)


# put image in 4D tensor of shape (1,3,height,width)
img_rgb = img.swapaxes(0,2).swapaxes(1,2) #(3,height,width)
minibatch_img = img_rgb.reshape(1,3,height,width)
filtered_img = f(minibatch_img)


# plot original image and two convoluted results
pylab.figure(1)
pylab.subplot(1,3,1);pylab.axis(‘off‘);
pylab.imshow(img)
title(‘origin image‘)

pylab.gray()
pylab.subplot(2,3,2); pylab.axis("off")
pylab.imshow(filtered_img[0,0,:,:]) #0:minibatch_index; 0:1-st filter
title(‘convolution 1‘)

pylab.subplot(2,3,3); pylab.axis("off")
pylab.imshow(filtered_img[0,1,:,:]) #0:minibatch_index; 1:1-st filter
title(‘convolution 2‘)

#pylab.show()




# maxpooling
from theano.tensor.signal import downsample

input = T.tensor4(‘input‘)
maxpool_shape = (2,2)
pooled_img = downsample.max_pool_2d(input,maxpool_shape,ignore_border = False)

maxpool = theano.function(inputs = [input],
                          outputs = [pooled_img])

pooled_res = numpy.squeeze(maxpool(filtered_img))              
#pylab.figure(2)
pylab.subplot(235);pylab.axis(‘off‘);
pylab.imshow(pooled_res[0,:,:])
title(‘down sampled 1‘)

pylab.subplot(236);pylab.axis(‘off‘);
pylab.imshow(pooled_res[1,:,:])
title(‘down sampled 2‘)

pylab.show()






3. CNN结构

想必大家随便google下CNN的图都滥大街了,这里拖出来那时候学CNN的时候一张图,自认为陪上讲解的话画得还易懂(<!--囧-->)

废话不多说了,直接上Lenet结构图:(从下往上顺着箭头看,最下面为底层original input)





4. CNN代码


去资源里下载吧,我放上去了喔~(in python)








欢迎参与讨论并关注本博客和微博Rachel____Zhang, 后续内容继续更新哦~