RL | DQN

tech2024-12-10  7

Catalogue

DQN FrameworkApplication1.1 Cartpole Introduction1.2 Code1.3 Result Reference

DQN Framework

The agent interacts with the environment to generate next state, reward and termination information, which will be stored in a replay buffer.

Agent与环境交互,产生下一个状态、奖励和终止等信息,并将这些信息存储在回放缓冲区中。

Sample from the buffer, calculate the loss and optimize the model.

从缓冲区采样,计算损耗并优化模型

Application

1.1 Cartpole Introduction

action spaces: left or right

动作空间:向左或者向右

state spaces: position of the cart on the track (小车在轨的位置)angle of the pole with the vertical (杆与竖直方向的夹角)cart velocity (小车速度)rate of change of the angle (角度变化率) tips the reward boundary of cartpole-v0 is 200, and that of cartpole-v1 is 500.

cartpole-v0的奖励边界是200,cartpole-v1的奖励边界是500。

1.2 Code

Github

1.3 Result

episode reward mean reward

Reference

150行代码实现DQN算法玩CartPoleIntroduction to Reinforcement Learning[动手学强化学习] 2.DQN解决CartPole-v0问题OpenAI Gym 经典控制环境介绍——CartPole(倒立摆)
最新回复(0)