Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-1/1-5/ #2
Replies: 1 comment 1 reply
-
|
如果智能体在状态s9执行行动a2,下一个状态也是s9,但奖励是rboundary=+1,这句话写错了,应该是rboundary=-1 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-1/1-5/
《强化学习的数学原理》的课程笔记
https://wgyhhh.top/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-1/1-5/
Beta Was this translation helpful? Give feedback.
All reactions