BP, Gradient descent and Generalisation

2024-08-17 18:21:07 221人阅读

For each training pattern presented to a multilayer neural network, we can computer the error:

y_d(p)-y(p)

Sum-Squared Error squaring and summing across all n patterns, SSE give a good measure of the overall performance of the network.

SSE depends on weight and threshholds.

技术分享

Back-propagation is a "gradient descent" training algorithm

技术分享

Step:

1. Calculate error for a single patternm

2. Compute weight changes that will make the greatest change in error with error gradient(steepest slope)

only possible with differentiable activation functions(e.g. sigmoid)

技术分享

gradient descent only approximate training proceeds pattern-by pattern.

gradient descent may not always reach true global error minimum, otherwise it may get stuck in "local" minimum.

技术分享

solution: momentum term

BP, Gradient descent and Generalisation

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们