Deep Q Network

4.1 DQN 算法更新
4.2 DQN 神经网络
4.3 DQN 思维决策
4.4 OpenAI gym 环境库

Notes

Deep Q-learning Algorithm

This gives us the final deep Q-learning algorithm with experience replay:

技术分享

There are many more tricks that DeepMind used to actually make it work – like target network, error clipping, reward clipping etc, but these are out of scope for this introduction.

The most amazing part of this algorithm is that it learns anything at all. Just think about it – because our Q-function is initialized randomly, it initially outputs complete garbage. And we are using this garbage (the maximum Q-value of the next state) as targets for the network, only occasionally folding in a tiny reward. That sounds insane, how could it learn anything meaningful at all? The fact is, that it does.

Extension

Using Keras and Deep Q-Network to Play FlappyBird | Ben Lau
Demystifying Deep Reinforcement Learning
- The above post is a must-read for those who are interested in deep reinforcement learning.

Learning Notes: Morvan - Reinforcement Learning, Part 4: Deep Q Network

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Learning Notes: Morvan - Reinforcement Learning, Part 4: Deep Q Network

Learning Notes: Morvan - Reinforcement Learning, Part 4: Deep Q Network

Deep Q Network

Notes

Deep Q-learning Algorithm

Extension

看完仍有疑问？有类似问题直接问程序猿