Sequential optimistic ad-hoc methods for nonstationary multi

Author:

Year

: 2009

Abstract: One of the common ways for showing the trade_off

between exploration_exploitation in reinforcement learning

problems is the multi_armed bandit problem. In this paper

we consider the MABP in a nonstationary environment which

features change during the period of learning. The

represented learning algorithms are intuition based solutions

to the exploration_explotation tarde_off that are called ad

hoc method. These methods include action_value methods

with e-greedy and softmax action selection rules, the

probability matching method and finally the adaptive pursuit

method. For producing near optimal results we change the

ad hoc methods to sequential optimistic ad hoc methods

which provide us completely better results.

URI: https://libsearch.um.ac.ir:443/fum/handle/fum/3379822

Keyword(s): Sequential optimistic ad hoc methods,

Exploration_exploitation,Multi_armed bandit,

Reinforcement learning,Action selection

Collections :

ProfDoc

Show Full MetaData Hide Full MetaData
Statistics

Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem

contributor author	مجید مازوچی	en
contributor author	فرزانه تاتاری	en
contributor author	محمدباقر نقیبی سیستانی	en
contributor author	Majid Mazouchi	fa
contributor author	Farzaneh Tatari	fa
contributor author	Mohammad Bagher Naghibi Sistani	fa
date accessioned	2020-06-06T14:02:18Z
date available	2020-06-06T14:02:18Z
date copyright	5/12/2009
date issued	2009
identifier uri	https://libsearch.um.ac.ir:443/fum/handle/fum/3379822?locale-attribute=en
description abstract	One of the common ways for showing the trade_off between exploration_exploitation in reinforcement learning problems is the multi_armed bandit problem. In this paper we consider the MABP in a nonstationary environment which features change during the period of learning. The represented learning algorithms are intuition based solutions to the exploration_explotation tarde_off that are called ad hoc method. These methods include action_value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. For producing near optimal results we change the ad hoc methods to sequential optimistic ad hoc methods which provide us completely better results.	en
language	English
title	Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem	en
type	Conference Paper
contenttype	External Fulltext
subject keywords	Sequential optimistic ad hoc methods	en
subject keywords	Exploration_exploitation	en
subject keywords	Multi_armed bandit	en
subject keywords	Reinforcement learning	en
subject keywords	Action selection	en
identifier link	https://profdoc.um.ac.ir/paper-abstract-1022487.html
conference title	هفدهمین کنفرانس مهندسی برق ایران ICEE2009	fa
conference location	تهران	fa
identifier articleid	1022487