Catalogue
DQN FrameworkApplication1.1 Cartpole Introduction1.2 Code1.3 Result
Reference
DQN Framework
The agent interacts with the environment to generate next state, reward and termination information, which will be stored in a replay buffer.
Agent与环境交互,产生下一个状态、奖励和终止等信息,并将这些信息存储在回放缓冲区中。
Sample from the buffer, calculate the loss and optimize the model.
从缓冲区采样,计算损耗并优化模型
Application
1.1 Cartpole Introduction
action spaces: left or right
动作空间:向左或者向右
state spaces:
position of the cart on the track (小车在轨的位置)angle of the pole with the vertical (杆与竖直方向的夹角)cart velocity (小车速度)rate of change of the angle (角度变化率) tips
the reward boundary of cartpole-v0 is 200, and that of cartpole-v1 is 500.
cartpole-v0的奖励边界是200,cartpole-v1的奖励边界是500。
1.2 Code
Github
1.3 Result
episode reward mean reward
Reference
150行代码实现DQN算法玩CartPoleIntroduction to Reinforcement Learning[动手学强化学习] 2.DQN解决CartPole-v0问题OpenAI Gym 经典控制环境介绍——CartPole(倒立摆)