•  English
    • Persian
    • English
  •   Login
  • Ferdowsi University of Mashhad
  • |
  • Information Center and Central Library
    • Persian
    • English
  • Home
  • Source Types
    • Journal Paper
    • Ebook
    • Conference Paper
    • Standard
    • Protocol
    • Thesis
  • Use Help
View Item 
  •   FUM Digital Library
  • Fum
  • Articles
  • ProfDoc
  • View Item
  •   FUM Digital Library
  • Fum
  • Articles
  • ProfDoc
  • View Item
  • All Fields
  • Title
  • Author
  • Year
  • Publisher
  • Subject
  • Publication Title
  • ISSN
  • DOI
  • ISBN
Advanced Search
JavaScript is disabled for your browser. Some features of this site may not work without it.

Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem

Author:
مجید مازوچی
,
فرزانه تاتاری
,
محمدباقر نقیبی سیستانی
,
Majid Mazouchi
,
Farzaneh Tatari
,
Mohammad Bagher Naghibi Sistani
Year
: 2009
Abstract: One of the common ways for showing the trade_off

between exploration_exploitation in reinforcement learning

problems is the multi_armed bandit problem. In this paper

we consider the MABP in a nonstationary environment which

features change during the period of learning. The

represented learning algorithms are intuition based solutions

to the exploration_explotation tarde_off that are called ad

hoc method. These methods include action_value methods

with e-greedy and softmax action selection rules, the

probability matching method and finally the adaptive pursuit

method. For producing near optimal results we change the

ad hoc methods to sequential optimistic ad hoc methods

which provide us completely better results.
URI: https://libsearch.um.ac.ir:443/fum/handle/fum/3379822
Keyword(s): Sequential optimistic ad hoc methods,

Exploration_exploitation
,
Multi_armed bandit,

Reinforcement learning
,
Action selection
Collections :
  • ProfDoc
  • Show Full MetaData Hide Full MetaData
  • Statistics

    Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem

Show full item record

contributor authorمجید مازوچیen
contributor authorفرزانه تاتاریen
contributor authorمحمدباقر نقیبی سیستانیen
contributor authorMajid Mazouchifa
contributor authorFarzaneh Tatarifa
contributor authorMohammad Bagher Naghibi Sistanifa
date accessioned2020-06-06T14:02:18Z
date available2020-06-06T14:02:18Z
date copyright5/12/2009
date issued2009
identifier urihttps://libsearch.um.ac.ir:443/fum/handle/fum/3379822?locale-attribute=en
description abstractOne of the common ways for showing the trade_off

between exploration_exploitation in reinforcement learning

problems is the multi_armed bandit problem. In this paper

we consider the MABP in a nonstationary environment which

features change during the period of learning. The

represented learning algorithms are intuition based solutions

to the exploration_explotation tarde_off that are called ad

hoc method. These methods include action_value methods

with e-greedy and softmax action selection rules, the

probability matching method and finally the adaptive pursuit

method. For producing near optimal results we change the

ad hoc methods to sequential optimistic ad hoc methods

which provide us completely better results.
en
languageEnglish
titleSequential optimistic ad-hoc methods for nonstationary multi_armed bandit problemen
typeConference Paper
contenttypeExternal Fulltext
subject keywordsSequential optimistic ad hoc methodsen
subject keywords

Exploration_exploitation
en
subject keywordsMulti_armed banditen
subject keywords

Reinforcement learning
en
subject keywordsAction selectionen
identifier linkhttps://profdoc.um.ac.ir/paper-abstract-1022487.html
conference titleهفدهمین کنفرانس مهندسی برق ایران ICEE2009fa
conference locationتهرانfa
identifier articleid1022487
  • About Us
نرم افزار کتابخانه دیجیتال "دی اسپیس" فارسی شده توسط یابش برای کتابخانه های ایرانی | تماس با یابش
DSpace software copyright © 2019-2022  DuraSpace