Skip to content

Commit e6034ea

Browse files
authored
Update README.md
1 parent 2d3b968 commit e6034ea

1 file changed

Lines changed: 16 additions & 0 deletions

File tree

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,19 @@
11
《强化学习中的数学原理》-个人笔记与思考总结
22

33
正在持续更新: https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/
4+
5+
## 📖 内容导航
6+
7+
| 章节 | 关键内容 | 状态 |
8+
| --- | --- | --- |
9+
| [前言](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Preface1/) | 本笔记的缘起、背景及阅读建议 ||
10+
| [第一章 基本概念](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-1/intro/) | 强化学习的基本概念 ||
11+
| [第二章 状态值与贝尔曼方程](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-2/intro/) | 回报、状态值、Bellman方程 ||
12+
| [第三章 最优状态值与贝尔曼最优方程](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-3/intro/) | 最优状态值、最优策略、Bellman最优方程 ||
13+
| [第四章 值迭代与策略迭代](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-4/intro/) | 值迭代算法、策略迭代算法、阶段策略迭代算法 ||
14+
| [第五章 蒙特卡罗方法](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-5/intro/) | MC Basic、MC Exploring Starts、MC-Greedy ||
15+
| [第六章 随机近似算法](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-6/intro/) | Robbins-Monro算法、Dvoretzky定理、随机梯度下降 ||
16+
| [第七章 时序差分算法](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-7/intro/) | Sarsa、n步Sarsa、Q-learning、 Off-policy、On-policy||
17+
| [第八章 值函数方法](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-8/intro/) |基于值函数的Sarsa、基于值函数的Q-learning ||
18+
| [第九章 策略梯度方法](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-9/intro/) | 策略梯度、REINFORCE ||
19+
| [第十章 演员-评论家算法](https://wgyhhhh.github.io/Mathematical-Foundations-of-Reinforcement-Learning-Notes/Chapter-10/intro/) | 优势演员-评论家、异策略演员-评论家、确定性演员-评论家 ||

0 commit comments

Comments
 (0)