Search
Now showing items 1-1 of 1
Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem
Year: 2009
Abstract:
One of the common ways for showing the trade_off
between exploration_exploitation in reinforcement learning
problems is the multi_armed bandit problem. In this paper
we consider the MABP in a nonstationary ...