Logistic回归
在用线性模型进行回归训练时,有时需要根据这个线性模型进行分类,则要找到一个单调可微的用于分类的函数将线性回归模型的预测值关联起来。这时就要用到逻辑回归,之前看吴军博士的《数学之美》中说腾讯和谷歌广告都有使用logistics回归算法。
如下图,可以清晰看到线性回归和逻辑回归的关系,一个线性方程被逻辑方程归一化后就成了逻辑回归。.
Logistic模型
对于二分类,输出y∈{0,1}<script type="math/tex" id="MathJax-Element-1">y \in \{0,1\}</script>,假如线性回归模型为z=θTx<script type="math/tex" id="MathJax-Element-2">z = \theta^Tx </script>,则要将z转成y,即y=g(z)<script type="math/tex" id="MathJax-Element-3">y=g(z)</script>。于是最直接的方式是用单位阶跃函数来表示,即
y=?????0,0.5,1,z<0;z=0;z>0;
<script type="math/tex; mode=display" id="MathJax-Element-4">
y = \left\{\begin{matrix}
0, & z<0;\\
0.5, & z=0;\\
1, & z>0;
\end{matrix}\right.
</script>
如图,
但阶跃函数不连续,于是用sigmoid函数替代之,为
y=11+e?z
<script type="math/tex; mode=display" id="MathJax-Element-5">
y = \frac{1}{1+e^{-z}}
</script>
如图,
则有,
y=11+e?(θTx)
<script type="math/tex; mode=display" id="MathJax-Element-6">
y = \frac{1}{1+e^{-(\theta^Tx)}}
</script>
即logistics函数,可化为,
lny1?y=θo+θ1x1+θ2x2+...+θmxm
<script type="math/tex; mode=display" id="MathJax-Element-7">
\ln \frac{y}{1 - y} = \theta_o + \theta_1 x_1 +\theta_2 x_2 +...+\theta_mx_m
</script>
此即为对数几率回归模型,其中y看成是样本x正例的概率,1-y则为样本x负例的概率,则
lnp(y=1|x)1?p(y=1|x)=θTx
<script type="math/tex; mode=display" id="MathJax-Element-8">
\ln \frac{p(y=1|x)}{1-p(y=1|x)}=\theta^T x
</script>
现在要解决的问题是如何求得θ<script type="math/tex" id="MathJax-Element-9">\theta</script>。对于给定样本集{(xi,yi)}mi=1<script type="math/tex" id="MathJax-Element-10">\{(x_i,y_i)\}_{i=1}^{m}</script>,每个样本出现的概率为,
p(yi,xi)=p(yi=1|xi)yi(1?p(yi=1|xi))1?yi
<script type="math/tex; mode=display" id="MathJax-Element-11">
p(y_i,x_i)=p(y_i=1|x_i)^{y_i} (1-p(y_i=1|x_i))^{1-y_i}
</script>
其中
yi<script type="math/tex" id="MathJax-Element-12">y_i</script>为1或0。则样本集出现的似然函数为
L(θ)=∏i=1mp(yi,xi)=∏i=1mp(yi=1|xi)yi(1?p(yi=1|xi))1?yi
<script type="math/tex; mode=display" id="MathJax-Element-13">
L(\theta) = \prod_{i=1}^{m}p(y_i,x_i)=\prod_{i=1}^{m}p(y_i=1|x_i)^{y_i} (1-p(y_i=1|x_i))^{1-y_i}
</script>
对数似然为:
l(θ)=∑i=1mlnp(yi,xi)
<script type="math/tex; mode=display" id="MathJax-Element-14">
l(\theta) = \sum_{i=1}^{m} \ln p(y_i,x_i)
</script>
=∑i=1myilnp(yi=1|xi)+(1?yi)ln(1?p(yi=1|xi))
<script type="math/tex; mode=display" id="MathJax-Element-15">
= \sum_{i=1}^{m} y_i \ln p(y_i=1|x_i) + (1-y_i) \ln (1-p(y_i=1|x_i))
</script>
=∑i=1myilnp(yi=1|xi)1?p(yi=1|xi)+∑i=1mln(1?p(yi=1|xi))
<script type="math/tex; mode=display" id="MathJax-Element-16">
=\sum_{i=1}^{m} y_i \ln \frac {p(y_i=1|x_i)}{1-p(y_i=1|x_i)} + \sum_{i=1}^{m} \ln (1-p(y_i=1|x_i))
</script>
=∑i=1myiθTxi?∑i=1mln(1+eθTxi)
<script type="math/tex; mode=display" id="MathJax-Element-17">
= \sum_{i=1}^{m} y_i \theta^T x_i - \sum_{i=1}^{m} \ln (1+e^{\theta^T x_i})
</script>
求对数似然最大化的θ<script type="math/tex" id="MathJax-Element-18">\theta</script>。其中通过求导没办法求得解,所以肯定要用迭代去逼近最优解,可以用梯度下降法或者牛顿法求的解。
实现代码
import tensorflow as tf
from numpy import *
x_train = [[1.0, 2.0], [2.0, 1.0], [2.0, 3.0], [3.0, 5.0], [1.0, 3.0], [4.0, 2.0], [7.0, 3.0], [4.0, 5.0], [11.0, 3.0],
[8.0, 7.0]]
y_train = [1, 1, 0, 1, 0, 1, 0, 1, 0, 1]
y_train = mat(y_train)
theta = tf.Variable(tf.zeros([2, 1]))
theta0 = tf.Variable(tf.zeros([1, 1]))
y = 1 / (1 + tf.exp(-tf.matmul(x_train, theta) + theta0))
loss = tf.reduce_mean(- y_train.reshape(-1, 1) * tf.log(y) - (1 - y_train.reshape(-1, 1)) * tf.log(1 - y))
train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in range(1000):
sess.run(train)
print(step, sess.run(theta).flatten(), sess.run(theta0).flatten())
欢迎关注:
<script type="text/javascript"> $(function () { $(‘pre.prettyprint code‘).each(function () { var lines = $(this).text().split(‘\n‘).length; var $numbering = $(‘