强化学习理论学习资料

tech2022-08-02  159

文章目录

推荐书籍论文

推荐书籍

machine learning and learning theory books

1. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018. 2 2. Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

reinforcement learning books

1. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. 4 2. Dimitri P Bertsekas and John N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996. (非常重要)

approximate dynamic programming

1. Remi Munos. Introduction to Reinforcement Learning and multi-armed bandits. NETADIS Summer School, 2013.

论文

Richard Bellman. Dynamic Programming. Princeton University Press, 1957.Dimitri P Bertsekas and John N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.Ronald A Howard. Dynamic Programming and Markov Processes. MIT Press, 1960.Alessandro Lazaric, Mohammad Ghavamzadeh, and R´emi Munos. “Finite-sample analysis of least-squares policy iteration”. In: The Journal of Machine Learning Research 13 (2012), pp. 3041–3074.Odalric-Ambrym Maillard et al. “Finite-sample analysis of Bellman residual minimization”. In: Asian Conference on Machine Learning (ACML). 2010, pp. 299–314.Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.R´emi Munos and Csaba Szepesv´ari. “Finite-time bounds for fitted value iteration”. In: Journal of Machine Learning Research 9 (2008), pp. 815–857.Remi Munos. ´ Introduction to Reinforcement Learning and multi-armed bandits. NETADIS Summer School, 2013Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.Richard S Sutton et al. “Policy gradient methods for reinforcement learning with function approximation”. In: Advances in Neural Information Processing Systems (NeurIPS). 1999, pp. 1057–1063.Leslie G Valiant. “A theory of the learnable”. In: Communications of the ACM 27.11 (1984), pp. 1134–1142.Christopher John Cornish Hellaby Watkins. “Learning From Delayed Rewards”. PhD Thesis. University of Cambridge, 1989.Ronald J. Williams and Leemon C. Baird III. Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep. NU-CCS-93-14, College of Computer Science, Northeastern University. 1993.Ronald J Williams. “Simple statistical gradient-following algorithms for connectionist reinforcement learning”. In: Machine learning 8.3-4 (1992).Shuang Wu and Jun Wang. Decision making and AI: a white paper. 2020.Pan Xu and Quanquan Gu. “A finite-time analysis of q-learning with neural network function approximation”. In: arXiv preprint arXiv:1912.04511 (2019).Zhuoran Yang, Yuchen Xie, and Zhaoran Wang. “A theoretical analysis of deep Q-learning”. In: arXiv preprint arXiv:1901.00137 (2019).
最新回复(0)