Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem
نویسنده:
, , , , ,سال
: 2009
چکیده: One of the common ways for showing the trade_off
between exploration_exploitation in reinforcement learning
problems is the multi_armed bandit problem. In this paper
we consider the MABP in a nonstationary environment which
features change during the period of learning. The
represented learning algorithms are intuition based solutions
to the exploration_explotation tarde_off that are called ad
hoc method. These methods include action_value methods
with e-greedy and softmax action selection rules, the
probability matching method and finally the adaptive pursuit
method. For producing near optimal results we change the
ad hoc methods to sequential optimistic ad hoc methods
which provide us completely better results.
between exploration_exploitation in reinforcement learning
problems is the multi_armed bandit problem. In this paper
we consider the MABP in a nonstationary environment which
features change during the period of learning. The
represented learning algorithms are intuition based solutions
to the exploration_explotation tarde_off that are called ad
hoc method. These methods include action_value methods
with e-greedy and softmax action selection rules, the
probability matching method and finally the adaptive pursuit
method. For producing near optimal results we change the
ad hoc methods to sequential optimistic ad hoc methods
which provide us completely better results.
کلیدواژه(گان): Sequential optimistic ad hoc methods,
Exploration_exploitation,Multi_armed bandit,
Reinforcement learning,Action selection
کالکشن
:
-
آمار بازدید
Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem
Show full item record
contributor author | مجید مازوچی | en |
contributor author | فرزانه تاتاری | en |
contributor author | محمدباقر نقیبی سیستانی | en |
contributor author | Majid Mazouchi | fa |
contributor author | Farzaneh Tatari | fa |
contributor author | Mohammad Bagher Naghibi Sistani | fa |
date accessioned | 2020-06-06T14:02:18Z | |
date available | 2020-06-06T14:02:18Z | |
date copyright | 5/12/2009 | |
date issued | 2009 | |
identifier uri | http://libsearch.um.ac.ir:80/fum/handle/fum/3379822 | |
description abstract | One of the common ways for showing the trade_off between exploration_exploitation in reinforcement learning problems is the multi_armed bandit problem. In this paper we consider the MABP in a nonstationary environment which features change during the period of learning. The represented learning algorithms are intuition based solutions to the exploration_explotation tarde_off that are called ad hoc method. These methods include action_value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. For producing near optimal results we change the ad hoc methods to sequential optimistic ad hoc methods which provide us completely better results. | en |
language | English | |
title | Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem | en |
type | Conference Paper | |
contenttype | External Fulltext | |
subject keywords | Sequential optimistic ad hoc methods | en |
subject keywords | Exploration_exploitation | en |
subject keywords | Multi_armed bandit | en |
subject keywords | Reinforcement learning | en |
subject keywords | Action selection | en |
identifier link | https://profdoc.um.ac.ir/paper-abstract-1022487.html | |
conference title | هفدهمین کنفرانس مهندسی برق ایران ICEE2009 | fa |
conference location | تهران | fa |
identifier articleid | 1022487 |