Union Branching Bandits

Union Branching Bandits

Uriel G. Rothblum

(joint work with Jay Sethuraman)

Abstract:

Union Bandit Problems (UBP) are sequential decision problems with states as subsets of a given set whose elements are called bandits. Selecting a bandit of an observed state, yields a random reward and generates a random set of bandits that become available along with the bandits that were available at the beginning of the period and were not selected. The model generalizes, among others, the classic multi-armed bandit problem. The optimality of priority policies for UBP's is established (under a transience condition) and two algorithms for finding optimal priority policies are described. Several variants of the problem are discussed.