Show simple item record

contributor authorمجید مازوچیen
contributor authorفرزانه تاتاریen
contributor authorمحمدباقر نقیبی سیستانیen
contributor authorMajid Mazouchifa
contributor authorFarzaneh Tatarifa
contributor authorMohammad Bagher Naghibi Sistanifa
date accessioned2020-06-06T14:02:18Z
date available2020-06-06T14:02:18Z
date copyright5/12/2009
date issued2009
identifier urihttp://libsearch.um.ac.ir:80/fum/handle/fum/3379822?locale-attribute=en&show=full
description abstractOne of the common ways for showing the trade_off

between exploration_exploitation in reinforcement learning

problems is the multi_armed bandit problem. In this paper

we consider the MABP in a nonstationary environment which

features change during the period of learning. The

represented learning algorithms are intuition based solutions

to the exploration_explotation tarde_off that are called ad

hoc method. These methods include action_value methods

with e-greedy and softmax action selection rules, the

probability matching method and finally the adaptive pursuit

method. For producing near optimal results we change the

ad hoc methods to sequential optimistic ad hoc methods

which provide us completely better results.
en
languageEnglish
titleSequential optimistic ad-hoc methods for nonstationary multi_armed bandit problemen
typeConference Paper
contenttypeExternal Fulltext
subject keywordsSequential optimistic ad hoc methodsen
subject keywords

Exploration_exploitation
en
subject keywordsMulti_armed banditen
subject keywords

Reinforcement learning
en
subject keywordsAction selectionen
identifier linkhttps://profdoc.um.ac.ir/paper-abstract-1022487.html
conference titleهفدهمین کنفرانس مهندسی برق ایران ICEE2009fa
conference locationتهرانfa
identifier articleid1022487


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record