首页 > 代码库 > Proximal Gradient Descent-近端梯度下降

Proximal Gradient Descent-近端梯度下降

Proximity Gradient Descent 用来解决 $L_1$ 正则中 0 点不可导的问题,先引入一个 proximity operator :

\[\mbox {prox} _{h}(x) = \arg \min_u (h(u)  + \frac{1}{2}||u - x||_2^2)\]

\[\mbox {prox} _{\lambda h}(x) = \arg \min_u (h(u)  + \frac{1}{2 \lambda}||u - x||_2^2)\]

对于一个最优化问题,形式如:

\[\min_x h(x) + g(x)\]

可以用如下方法来迭代求解:

\[x^{k+1} = \mbox{prox}_{\gamma h}(x^k-\gamma \nabla g(x^k)) \]

因为 如此迭代下去,是会朝着极小方向前进的:

\begin{aligned}x^{k+1} &= \mbox{prox}_{\gamma h}(x^k-\gamma \nabla g(x^k)) \\
&= \mbox{arg}\min_x \left(h(x)+\frac{1}{2\gamma}\mid\mid x-x^k+ \gamma \nabla g(x^k) \mid\mid_2^2\right) \\
&= \mbox{arg}\min_x \left(h(x)+ \frac{\gamma}{2}\mid\mid \nabla g(x)\mid\mid_2^2 + \gamma \nabla g(x^k)^T(x-x^k)+ \frac{1}{2\gamma}\mid\mid x-x^k \mid\mid_2^2\right) \\
&= \mbox{arg}\min_x \left(h(x)+ g(x^k) + \gamma \nabla g(x^k)^T(x-x^k) +\frac{1}{2\gamma}\mid\mid x-x^k \mid\mid_2^2\right) \\
& \approx \mbox{arm}\min_x \ h(x)+g(x)
\end{aligned}

后两式的变化是因为,变化的两项都与 x 无关,然后通过二阶泰勒展开近似得到目标函数。

当 $g(x)$ 为 $L_1$ 正则时:

\[ x^{k+1}  = \mbox{prox} _{\lambda^k y} (x^k  - \lambda^k  \nabla f(x^k))\]

 

 

参考:

http://webcache.googleusercontent.com/search?q=cache:6KVgr87ZVZ8J:roachsinai.github.io/2016/08/03/1L1_L2_norm/+&cd=3&hl=zh-CN&ct=clnk&gl=id

http://breezedeus.github.io/2013/11/16/breezedeus-proximal-gd.html

http://blog.csdn.net/lanyanchenxi/article/details/50448640

https://www.zhihu.com/question/38426074

Proximal Gradient Descent-近端梯度下降