Dissecting “Reinforcement Studying” by Richard S. Sutton with Customized Python Implementations, Episode III
We proceed our deep dive into Sutton’s nice e book about RL [1] and right here give attention to Monte Carlo (MC) strategies. These are in a position to study from expertise alone, i.e. don’t require any form of mannequin of the surroundings, as e.g. required by the Dynamic programming (DP) strategies we launched within the earlier publish.
That is extraordinarily tempting — as typically the mannequin shouldn’t be recognized, or it’s exhausting to mannequin the transition chances. Think about the sport of Blackjack: although we absolutely perceive the sport and the foundations, fixing it through DP strategies could be very tedious — we must compute every kind of chances, e.g. given the presently performed playing cards, how doubtless is a “blackjack”, how doubtless is it that one other seven is dealt … By way of MC strategies, we don’t need to take care of any of this, and easily play and study from expertise.
Attributable to not utilizing a mannequin, MC strategies are unbiased. They’re conceptually easy and straightforward to grasp, however exhibit a excessive variance and can’t be solved in iterative trend (bootstrapping).
As talked about, right here we are going to introduce these strategies following Chapter 5 of Sutton’s e book…