Search
نمایش تعداد 1-1 از 1
Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem
سال: 2009
خلاصه:
One of the common ways for showing the trade_off
between exploration_exploitation in reinforcement learning
problems is the multi_armed bandit problem. In this paper
we consider the MABP in a nonstationary ...