一、矩阵分解回想
在博文推荐算法——基于矩阵分解的推荐算法中,提到了将用户-商品矩阵进行分解。从而实现对未打分项进行打分。
矩阵分解是指将一个矩阵分解成两个或者多个矩阵的乘积。对于上述的用户-商品矩阵(评分矩阵),记为Vm×n<script type="math/tex" id="MathJax-Element-111">V_{m\times n}</script>。能够将其分解成两个或者多个矩阵的乘积,如果分解成两个矩阵Wm×k<script type="math/tex" id="MathJax-Element-112">W_{m\times k}</script>和Hk×n<script type="math/tex" id="MathJax-Element-113">H_{k\times n}</script>。我们要使得矩阵Wm×k<script type="math/tex" id="MathJax-Element-114">W_{m\times k}</script>和Hk×n<script type="math/tex" id="MathJax-Element-115">H_{k\times n}</script>的乘积能够还原原始的矩阵Vm×n<script type="math/tex" id="MathJax-Element-116">V_{m\times n}</script>:
Vm×n≈Wm×k×Hk×n=V^m×n
<script type="math/tex; mode=display" id="MathJax-Element-334">V_{m\times n}\approx W_{m\times k}\times H_{k\times n}=\hat{V}_{m\times n}</script>
当中,矩阵Wm×k<script type="math/tex" id="MathJax-Element-335">W_{m\times k}</script>表示的是m<script type="math/tex" id="MathJax-Element-336">m</script>个用户与k<script type="math/tex" id="MathJax-Element-337">k</script>个主题之间的关系,而矩阵Hk×n<script type="math/tex" id="MathJax-Element-338">H_{k\times n}</script>表示的是k<script type="math/tex" id="MathJax-Element-339">k</script>个主题与n<script type="math/tex" id="MathJax-Element-340">n</script>个商品之间的关系。
通常在用户对商品进行打分的过程中。打分是非负的,这就要求:
Wm×k?0
<script type="math/tex; mode=display" id="MathJax-Element-350">W_{m\times k}\geqslant 0</script>
Hk×n?0
<script type="math/tex; mode=display" id="MathJax-Element-409">H_{k\times n}\geqslant 0</script>
这便是非负矩阵分解(Non-negtive Matrix Factorization, NMF)的来源。
二、非负矩阵分解
2.1、非负矩阵分解的形式化定义
上面简介了非负矩阵分解的基本含义。简单来讲,非负矩阵分解是在矩阵分解的基础上对分解完毕的矩阵加上非负的限制条件。即对于用户-商品矩阵Vm×n<script type="math/tex" id="MathJax-Element-434">V_{m\times n}</script>,找到两个矩阵Wm×k<script type="math/tex" id="MathJax-Element-435">W_{m\times k}</script>和Hk×n<script type="math/tex" id="MathJax-Element-436">H_{k\times n}</script>,使得:
Vm×n≈Wm×k×Hk×n=V^m×n
<script type="math/tex; mode=display" id="MathJax-Element-394">V_{m\times n}\approx W_{m\times k}\times H_{k\times n}=\hat{V}_{m\times n}</script>
同一时候要求:
Wm×k?0
<script type="math/tex; mode=display" id="MathJax-Element-392">W_{m\times k}\geqslant 0</script>
Hk×n?0
<script type="math/tex; mode=display" id="MathJax-Element-438">H_{k\times n}\geqslant 0</script>
2.2、损失函数
为了能够定量的比較矩阵Vm×n<script type="math/tex" id="MathJax-Element-507">V_{m\times n}</script>和矩阵V^m×n<script type="math/tex" id="MathJax-Element-508">\hat{V}_{m\times n}</script>的近似程度。在參考文献1中作者提出了两种损失函数的定义方式:
∥A?B∥2=∑i,j(Ai,j?Bi,j)2
<script type="math/tex; mode=display" id="MathJax-Element-518">\left \| A-B \right \|^2=\sum_{i,j}\left ( A_{i,j}-B_{i,j} \right )^2</script>
D(A∥B)=∑i,j(Ai,jlogAi,jBi,j?Ai,j+Bi,j)
<script type="math/tex; mode=display" id="MathJax-Element-751">D\left ( A\parallel B \right )=\sum_{i,j}\left ( A_{i,j}log\frac{A_{i,j}}{B_{i,j}}-A_{i,j}+B_{i,j} \right )</script>
在KL散度的定义中,D(A∥B)?0<script type="math/tex" id="MathJax-Element-752">D\left ( A\parallel B \right )\geqslant 0</script>。当且仅当A=B<script type="math/tex" id="MathJax-Element-753">A=B</script>时取得等号。
当定义好损失函数后,须要求解的问题就变成了例如以下的形式,相应于不同的损失函数:
求解例如以下的最小化问题:
minimize∥V?WH∥2s.t.W?0,H?0
<script type="math/tex; mode=display" id="MathJax-Element-754">\begin{matrix}
minimize\; \left \| V-WH \right \|^2\\
s.t.\; W\geqslant 0,H\geqslant 0
\end{matrix}</script>
minimizeD(V∥WH)s.t.W?0,H?0
<script type="math/tex; mode=display" id="MathJax-Element-755">\begin{matrix}
minimize\; D\left ( V\parallel WH \right )\\
s.t.\; W\geqslant 0,H\geqslant 0
\end{matrix}</script>
2.3、优化问题的求解
在參考文献1中,作者提出了乘法更新规则(multiplicative update rules),详细的操作例如以下:
对于平方距离的损失函数:
Wi,k=Wi,k(VHT)i,k(WHHT)i,k
<script type="math/tex; mode=display" id="MathJax-Element-773">W_{i,k}=W_{i,k}\frac{\left ( VH^T \right )_{i,k}}{\left ( WHH^T \right )_{i,k}}</script>
Hk,j=Hk,j(WTV)k,j(WTWH)k,j
<script type="math/tex; mode=display" id="MathJax-Element-795">H_{k,j}=H_{k,j}\frac{\left ( W^TV \right )_{k,j}}{\left ( W^TWH \right )_{k,j}}</script>
对于KL散度的损失函数:
Wi,k=Wi,k∑uHk,uVi,u/(WH)i,u∑vHk,v
<script type="math/tex; mode=display" id="MathJax-Element-804">W_{i,k}=W_{i,k}\frac{\sum_{u}H_{k,u}V_{i,u}/\left ( WH \right )_{i,u}}{\sum_{v}H_{k,v}}</script>
Hk,j=Hk,j∑uWu,kVu,j/(WH)u,j)∑vWv,k
<script type="math/tex; mode=display" id="MathJax-Element-895">H_{k,j}=H_{k,j}\frac{\sum_{u}W_{u,k}V_{u,j}/\left ( WH \right )_{u,j})}{\sum_{v}W_{v,k}}</script>
上述的乘法规则主要是为了在计算的过程中保证非负,而基于梯度下降的方法中,加减运算无法保证非负。事实上上述的乘法更新规则与基于梯度下降的算法是等价的。以下以平方距离为损失函数说明上述过程的等价性:
平方损失函数能够写成:
l=∑i=1m∑j=1n[Vi,j?(∑k=1rWi,k?Hk,j)]2
<script type="math/tex; mode=display" id="MathJax-Element-924">l=\sum_{i=1}^{m}\sum_{j=1}^{n}\left [ V_{i,j}-\left ( \sum_{k=1}^{r}W_{i,k}\cdot H_{k,j} \right ) \right ]^2</script>
使用损失函数对Hk,j<script type="math/tex" id="MathJax-Element-925">H_{k,j}</script>求偏导数:
?l?Hk,j=∑i=1m∑j=1n[2(Vi,j?(∑k=1rWi,k?Hk,j))?(?Wi,k)]=?2[(WTV)k,j?(WTWH)k,j]
<script type="math/tex; mode=display" id="MathJax-Element-942">\begin{align*}
\frac{\partial l}{\partial H_{k,j}}&= \sum_{i=1}^{m}\sum_{j=1}^{n}\left [ 2\left ( V_{i,j}-\left ( \sum_{k=1}^{r}W_{i,k}\cdot H_{k,j} \right ) \right )\cdot \left ( -W_{i,k} \right ) \right ]\\
&= -2\left [ \left ( W^TV \right )_{k,j}-\left ( W^TWH \right )_{k,j} \right ]
\end{align*}</script>
则依照梯度下降法的思路:
Hk,j=Hk,j?ηk,j?l?Hk,j
<script type="math/tex; mode=display" id="MathJax-Element-952">H_{k,j}=H_{k,j}-\eta _{k,j}\frac{\partial l}{\partial H_{k,j}}</script>
即为:
Hk,j=Hk,j+ηk,j[(WTV)k,j?(WTWH)k,j]
<script type="math/tex; mode=display" id="MathJax-Element-991">H_{k,j}=H_{k,j}+\eta _{k,j}\left [ \left ( W^TV \right )_{k,j}-\left ( W^TWH \right )_{k,j} \right ]</script>
令ηk,j=Hk,j(WTWH)k,j<script type="math/tex" id="MathJax-Element-992">\eta _{k,j}=\frac{H_{k,j}}{\left ( W^TWH \right )_{k,j}}</script>,即能够得到上述的乘法更新规则的形式。
2.4、非负矩阵分解的实现
对于例如以下的矩阵:
通过非负矩阵分解。得到例如以下的两个矩阵:
对原始矩阵的还原为:
实现的代码
from numpy import *
def load_data(file_path):
f = open(file_path)
V = []
for line in f.readlines():
lines = line.strip().split("\t")
data = http://www.mamicode.com/[]"hljs-keyword">for x in lines:
data.append(float(x))
V.append(data)
return mat(V)
def train(V, r, k, e):
m, n = shape(V)
W = mat(random.random((m, r)))
H = mat(random.random((r, n)))
for x in xrange(k):
V_pre = W * H
E = V - V_pre
err = 0.0
for i in xrange(m):
for j in xrange(n):
err += E[i,j] * E[i,j]
print err
if err < e:
break
a = W.T * V
b = W.T * W * H
for i_1 in xrange(r):
for j_1 in xrange(n):
if b[i_1,j_1] != 0:
H[i_1,j_1] = H[i_1,j_1] * a[i_1,j_1] / b[i_1,j_1]
c = V * H.T
d = W * H * H.T
for i_2 in xrange(m):
for j_2 in xrange(r):
if d[i_2, j_2] != 0:
W[i_2,j_2] = W[i_2,j_2] * c[i_2,j_2] / d[i_2, j_2]
return W,H
if __name__ == "__main__":
file_path = "./data1"
V = load_data(file_path)
W, H = train(V, 2, 100, 1e-5 )
print V
print W
print H
print W * H
收敛曲线例如以下图所看到的:
‘‘‘
Date:20160411
@author: zhaozhiyong
‘‘‘
from pylab import *
from numpy import *
data = http://www.mamicode.com/[]"hljs-string">"result_nmf")
for line in f.readlines():
lines = line.strip()
data.append(lines)
n = len(data)
x = range(n)
plot(x, data, color=‘r‘,linewidth=3)
plt.title(‘Convergence curve‘)
plt.xlabel(‘generation‘)
plt.ylabel(‘loss‘)
show()
參考文献
<script type="text/javascript"> $(function () { $(‘pre.prettyprint code‘).each(function () { var lines = $(this).text().split(‘\n‘).length; var $numbering = $(‘
‘).addClass(‘pre-numbering‘).hide(); $(this).addClass(‘has-numbering‘).parent().append($numbering); for (i = 1; i <= lines; i++) { $numbering.append($(‘
‘).text(i)); }; $numbering.fadeIn(1700); }); }); </script>