000 | nam a22 7a 4500 | ||
---|---|---|---|
999 |
_c29072 _d29072 |
||
008 | 180820b xxu||||| |||| 00| 0 eng d | ||
020 | _a9781608454921 | ||
082 |
_a006.31 _bSZE |
||
100 | _aSzepesvari, Csaba | ||
245 | _aAlgorithms for reinforcement learning | ||
260 |
_bMorgan & Claypool, _c2010 _aUK: |
||
300 |
_axii, 89 p. : _bill.; _c23.5 cm. |
||
365 |
_aUS$ _b35.00 |
||
440 | _aSynthesis lectures on artificial intelligence and machine learning #9 | ||
504 | _aIncludes bibliographical references. | ||
520 | _aReinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' | ||
650 | _aMachine learning | ||
650 | _aNatural gradient | ||
650 | _aPolicy gradient | ||
650 | _aActor-critic methods | ||
650 | _aQ-learning | ||
650 | _aPAC-learning | ||
650 | _aPlanning | ||
650 | _aSimulation | ||
650 | _aOnline learning | ||
650 | _aActive learning | ||
650 | _aBias-variance tradeoff | ||
650 | _aOverfitting | ||
650 | _aLeast-squares methods | ||
650 | _aStochastic gradient methods | ||
650 | _aFunction approximation | ||
650 | _aSimulation optimization | ||
650 | _aTwo-timescale stochastic approximation | ||
650 | _aMonte-Carlo methods | ||
650 | _aStochastic approximation | ||
650 | _aMathematical models | ||
650 | _aTemporal difference learning | ||
650 | _aEngineering & Applied Sciences | ||
650 | _aMarkov decision processes | ||
942 |
_2ddc _cBK |