RUTCOR Colloquia - November 2, 2006
Speaker: Uri Rothblum
Affiliation: Technion, Israel Institute of Technology
Title: Risk-sensitive and risk-neutral multi-armed bandits.
Time: 2:00 - 3:00 PM
Location: RUTCOR Building - Room 139, Rutgers University, Busch Campus, Piscataway, NJ
Abstract:
The classic result for the multi-armed bandit is that each state of each
bandit (Markov chain with rewards) has an index that depends only on the
data for its bandit, and expected discounted income is maximized by
playing at each epoch a bandit whose current state has the largest index.
We extend this result to the cases of risk-averse and risk-seeking
exponential utility functions, while generalizing it somewhat for the
risk-neutral case. Our approach is constructive. It uses domination in
place of indices. A simple recursion assigns to each state of each bandit
a reward and an amplification that depend solely on the data for its
bandit. These rewards and amplifications determine a ranking of the
states, and we show that it is optimal to play at each epoch any bandit
whose state has the highest ranking. The algorithm we develop is simpler
than previous algorithms that determine indices.
(Joint work with Eric V. Denardo, Haechurl Park)
Back to Seminars Page.
Back to RUTCOR homepage.
|