RL | DQN

tech2024-12-10 82

Catalogue

DQN FrameworkApplication1.1 Cartpole Introduction1.2 Code1.3 Result Reference

DQN Framework

The agent interacts with the environment to generate next state, reward and termination information, which will be stored in a replay buffer.

Agent与环境交互，产生下一个状态、奖励和终止等信息，并将这些信息存储在回放缓冲区中。

Sample from the buffer, calculate the loss and optimize the model.

从缓冲区采样，计算损耗并优化模型

Application

1.1 Cartpole Introduction

action spaces: left or right

动作空间：向左或者向右

state spaces: position of the cart on the track （小车在轨的位置）angle of the pole with the vertical （杆与竖直方向的夹角）cart velocity （小车速度）rate of change of the angle （角度变化率） tips the reward boundary of cartpole-v0 is 200, and that of cartpole-v1 is 500.

cartpole-v0的奖励边界是200，cartpole-v1的奖励边界是500。

1.2 Code

Github

1.3 Result

episode reward mean reward

Reference

150行代码实现DQN算法玩CartPoleIntroduction to Reinforcement Learning[动手学强化学习] 2.DQN解决CartPole-v0问题OpenAI Gym 经典控制环境介绍——CartPole（倒立摆）

最新回复(0)