Uriel G. Rothblum
(joint work with Jay Sethuraman)
Abstract:
Union Bandit Problems (UBP) are sequential decision problems with states as subsets of a
given set whose elements are called bandits. Selecting a bandit of an observed
state, yields a random reward and generates a random set of bandits that become
available along with the bandits that were available at the beginning of the
period and were not selected. The model generalizes, among others, the classic
multi-armed bandit problem. The optimality of priority policies for UBP's is established (under a transience
condition) and two algorithms for finding optimal priority policies are
described. Several variants of the problem are discussed.