Topzle Topzle

Reinforcement learning

Updated: Wikipedia source

Reinforcement learning

In machine learning and optimal control, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. While supervised learning and unsupervised learning algorithms respectively attempt to discover patterns in labeled and unlabeled data, reinforcement learning involves training an agent through interactions with its environment. To learn to maximize rewards from these interactions, the agent makes decisions between trying new actions to learn more about the environment (exploration), or using current knowledge of the environment to take the best action (exploitation). The search for the optimal balance between these two strategies is known as the exploration–exploitation dilemma.

The environment is typically stated in the form of a Markov decision process, as many reinforcement learning algorithms use dynamic programming techniques. The main difference between classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the Markov decision process, and they target large Markov decision processes where exact methods become infeasible.

Tables

· Comparison of key algorithms
Monte Carlo
Monte Carlo
Algorithm
Monte Carlo
Description
Every visit to Monte Carlo
Policy
Either
Action space
Discrete
State space
Discrete
Operator
Sample-means of state-values or action-values
TD learning
TD learning
Algorithm
TD learning
Description
State–action–reward–state
Policy
Off-policy
Action space
Discrete
State space
Discrete
Operator
State-value
Q-learning
Q-learning
Algorithm
Q-learning
Description
State–action–reward–state
Policy
Off-policy
Action space
Discrete
State space
Discrete
Operator
Action-value
SARSA
SARSA
Algorithm
SARSA
Description
State–action–reward–state–action
Policy
On-policy
Action space
Discrete
State space
Discrete
Operator
Action-value
DQN
DQN
Algorithm
DQN
Description
Deep Q Network
Policy
Off-policy
Action space
Discrete
State space
Continuous
Operator
Action-value
DDPG
DDPG
Algorithm
DDPG
Description
Deep Deterministic Policy Gradient
Policy
Off-policy
Action space
Continuous
State space
Continuous
Operator
Action-value
A3C
A3C
Algorithm
A3C
Description
Asynchronous Advantage Actor-Critic Algorithm
Policy
On-policy
Action space
Discrete
State space
Continuous
Operator
Advantage (=action-value - state-value)
TRPO
TRPO
Algorithm
TRPO
Description
Trust Region Policy Optimization
Policy
On-policy
Action space
Continuous or Discrete
State space
Continuous
Operator
Advantage
PPO
PPO
Algorithm
PPO
Description
Proximal Policy Optimization
Policy
On-policy
Action space
Continuous or Discrete
State space
Continuous
Operator
Advantage
TD3
TD3
Algorithm
TD3
Description
Twin Delayed Deep Deterministic Policy Gradient
Policy
Off-policy
Action space
Continuous
State space
Continuous
Operator
Action-value
SAC
SAC
Algorithm
SAC
Description
Soft Actor-Critic
Policy
Off-policy
Action space
Continuous
State space
Continuous
Operator
Advantage
DSAC
DSAC
Algorithm
DSAC
Description
Distributional Soft Actor Critic
Policy
Off-policy
Action space
Continuous
State space
Continuous
Operator
Action-value distribution
Algorithm
Description
Policy
Action space
State space
Operator
Monte Carlo
Every visit to Monte Carlo
Either
Discrete
Discrete
Sample-means of state-values or action-values
TD learning
State–action–reward–state
Off-policy
Discrete
Discrete
State-value
Q-learning
State–action–reward–state
Off-policy
Discrete
Discrete
Action-value
SARSA
State–action–reward–state–action
On-policy
Discrete
Discrete
Action-value
DQN
Deep Q Network
Off-policy
Discrete
Continuous
Action-value
DDPG
Deep Deterministic Policy Gradient
Off-policy
Continuous
Continuous
Action-value
A3C
Asynchronous Advantage Actor-Critic Algorithm
On-policy
Discrete
Continuous
Advantage (=action-value - state-value)
TRPO
Trust Region Policy Optimization
On-policy
Continuous or Discrete
Continuous
Advantage
PPO
Proximal Policy Optimization
On-policy
Continuous or Discrete
Continuous
Advantage
TD3
Twin Delayed Deep Deterministic Policy Gradient
Off-policy
Continuous
Continuous
Action-value
SAC
Soft Actor-Critic
Off-policy
Continuous
Continuous
Advantage
DSAC
Distributional Soft Actor Critic
Off-policy
Continuous
Continuous
Action-value distribution

References

  1. Journal of Artificial Intelligence Research
    http://webarchive.loc.gov/all/20011120234539/http://www.cs.washington.edu/research/jair/abstracts/kaelbling96a.html
  2. Reinforcement Learning
    https://doi.org/10.1007%2F978-3-642-27645-3_1
  3. Reinforcement Learning for Sequential Decision and Optimal Control
    https://link.springer.com/book/10.1007/978-981-19-7784-8
  4. Artificial intelligence: a modern approach
  5. Annual Review of Neuroscience
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3490621
  6. Electric Power Systems Research
    https://doi.org/10.1016%2Fj.epsr.2022.108515
  7. arXiv
    https://arxiv.org/abs/2005.04323
  8. International Journal of Electrical Power & Energy Systems
    https://doi.org/10.1016%2Fj.ijepes.2021.107628
  9. Sutton & Barto 2018, Chapter 11.
  10. IEEE Transactions on Intelligent Transportation Systems
    https://arxiv.org/abs/2110.12359
  11. Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement
    https://www.springer.com/mathematics/applications/book/978-1-4020-7454-7
  12. Mathematics of Operations Research
    https://doi.org/10.1287%2Fmoor.22.1.222
  13. KI 2011: Advances in Artificial Intelligence
    http://www.tokic.com/www/tokicm/publikationen/papers/KI2011.pdf
  14. "Reinforcement learning: An introduction"
    https://web.archive.org/web/20170712170739/http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf
  15. Machine Learning
    https://link.springer.com/article/10.1007/BF00114726
  16. Temporal Credit Assignment in Reinforcement Learning
    https://web.archive.org/web/20170330002227/http://incompleteideas.net/sutton/publications.html#PhDthesis
  17. Sutton & Barto 2018, §6. Temporal-Difference Learning.
    http://incompleteideas.net/sutton/book/ebook/node60.html
  18. Machine Learning
    https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857
  19. Learning from Delayed Rewards
    http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf
  20. Entropy
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407070
  21. Proceedings of the IEEE First International Conference on Neural Networks
    https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.8871
  22. Reinforcement Learning for Humanoid Robotics
    https://web.archive.org/web/20130512223911/http://www-clmc.usc.edu/publications/p/peters-ICHR2003.pdf
  23. Medium
    https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2
  24. A Survey on Policy Search for Robotics
    http://eprints.lincoln.ac.uk/28029/1/PolicySearchReview.pdf
  25. Machine Learning: Proceedings of the Seventh International Workshop
  26. Machine Learning
    https://link.springer.com/content/pdf/10.1007/BF00992699.pdf
  27. Meta-Learning
    https://www.sciencedirect.com/science/article/pii/B9780323899314000110
  28. Advances in Neural Information Processing Systems
    https://proceedings.neurips.cc/paper/2019/file/1b742ae215adf18b75449c6e272fd92d-Paper.pdf
  29. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics
    https://dl.acm.org/doi/10.1109/TSMCB.2011.2170565
  30. cie.acm.org
    https://cie.acm.org/articles/use-reinforcements-learning-testing-game-mechanics/
  31. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    https://dl.acm.org/doi/10.1109/IROS.2017.8206234
  32. Journal of Artificial Intelligence Research
    https://dl.acm.org/doi/10.1613/jair.1.12440
  33. Autonomous Agents and Multi-Agent Systems
    https://link.springer.com/article/10.1007/s10458-019-09404-2
  34. arXiv
    https://arxiv.org/abs/2404.01220
  35. Proceedings of the Workshop on Autonomous Cybersecurity (AutonomousCyber '24)
    https://arxiv.org/abs/2410.17647
  36. Clemens Winter's Blog
    https://clemenswinter.com/2023/04/14/entity-based-reinforcement-learning/
  37. arXiv
    https://arxiv.org/abs/2111.08596
  38. Proceedings of the 30th International Conference on Neural Information Processing Systems
    http://dl.acm.org/citation.cfm?id=3157382.3157509
  39. umichrl.pbworks.com
    http://umichrl.pbworks.com/Successes-of-Reinforcement-Learning/
  40. 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)
    https://ieeexplore.ieee.org/document/9116294
  41. Business Weekly
    https://www.businessweekly.co.uk/news/academia-research/smartphones-get-smarter-essex-innovation
  42. i
    https://inews.co.uk/news/technology/future-smartphones-prolong-battery-life-monitoring-behaviour-558689
  43. Embodied Artificial Intelligence
    https://doi.org/10.1007%2F978-3-540-27833-7_19
  44. PLOS ONE
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607028
  45. Intrinsically Motivated Learning in Natural and Artificial Systems
    https://people.cs.umass.edu/~barto/IMCleVer-chapter-totypeset2.pdf
  46. The Journal of Machine Learning in Finance
    https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3374766
  47. Adaptive Behavior
    https://arxiv.org/abs/1811.08318
  48. cf. Sutton & Barto 2018, Section 5.4, p. 100
  49. IEEE Transactions on Neural Networks and Learning Systems
    https://arxiv.org/abs/2001.02811
  50. 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)
    https://arxiv.org/abs/2002.05502
  51. IEEE Transactions on Pattern Analysis and Machine Intelligence
    https://arxiv.org/abs/2310.05858
  52. Dynamic, Genetic and Chaotic Programming: The Sixth-Generation Computer Technology Series
  53. Foundations and Trends in Machine Learning
    https://arxiv.org/abs/1811.12560
  54. Nature
    https://ui.adsabs.harvard.edu/abs/2015Natur.518..529M
  55. International Conference on Learning Representations
    https://arxiv.org/abs/1412.6572
  56. Machine Learning and Data Mining in Pattern Recognition
    https://arxiv.org/abs/1701.04143
  57. Adversarial Attacks on Neural Network Policies
    https://search.worldcat.org/oclc/1106256905
  58. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)
    https://doi.org/10.1609%2Faaai.v36i7.20684
  59. Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference
    https://doi.org/10.1109%2FFUZZY.1994.343737
  60. 2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI)
    http://users.iit.uni-miskolc.hu/~vinczed/research/vinczed_sami2017_author_draft.pdf
  61. Proceeding ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
    https://ai.stanford.edu/~ang/papers/icml00-irl.pdf
  62. Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3
    https://dl.acm.org/doi/10.5555/1620270.1620297
  63. Information Sciences
    https://arxiv.org/abs/2105.12092
  64. Autonomous Agents and Multi-Agent Systems
    https://doi.org/10.1007%2Fs10458-022-09552-y
  65. Multiple Attribute Decision Making: Methods and Applications
  66. IEEE Transactions on Pattern Analysis and Machine Intelligence
    https://ui.adsabs.harvard.edu/abs/2024ITPAM..4611216G
  67. The Journal of Machine Learning Research
    https://jmlr.org/papers/volume16/garcia15a/garcia15a.pdf
  68. Proceedings of the 35th International Conference on Machine Learning
    https://proceedings.mlr.press/v80/dabney18a.html
  69. Advances in Neural Information Processing Systems
    https://proceedings.neurips.cc/paper/2015/hash/64223ccf70bbb65a3a4aceac37e21016-Abstract.html
  70. scholar.google.com
    https://scholar.google.com/citations?view_op=view_citation&hl=en&user=LnwyFkkAAAAJ&citation_for_view=LnwyFkkAAAAJ:eQOLeE2rZwMC
  71. Proceedings of the AAAI Conference on Artificial Intelligence
    https://ojs.aaai.org/index.php/AAAI/article/view/9561
  72. Advances in Neural Information Processing Systems
    https://proceedings.neurips.cc/paper_files/paper/2022/hash/d2511dfb731fa336739782ba825cd98c-Abstract-Conference.html
  73. Bozinovski, S. (1982). "A self-learning system using secondary reinforcement". In Trappl, Robert (ed.). Cybernetics and
  74. Bozinovski S. (1995) "Neuro genetic agents and structural theory of self-reinforcement learning systems". CMPSCI Technic
    https://web.cs.umass.edu/publication/docs/1995/UM-CS-1995-107.pdf
  75. Bozinovski, S. (2014) "Modeling mechanisms of cognition-emotion interaction in artificial neural networks, since 1981."
  76. "An API for reinforcement learning"
    https://gymnasium.farama.org/
  77. arXiv
    https://arxiv.org/abs/2501.12948
  78. ICLR
    https://openreview.net/forum?id=r1etN1rtPB
  79. International Conference on Learning Representations
    https://openreview.net/forum?id=ryx0N3IaIV
  80. Proceedings of the 38th International Conference on Machine Learning
    https://proceedings.mlr.press/v139/greenberg21a.html
Image
Source:
Tip: Wheel or +/− to zoom, drag to pan, Esc to close.