Category identification in the multi armed bandit setting

25.01.2022, 10:00  –  hybrid - Golm 2.9.02.22 and zoom Forschungsseminar Statistik

James Cheshire

Abstract: We consider a stochastic multi armed bandit problem with potentially infinite arms, with the assumption that the arm set can be partitioned into an unknown number of $$K$$ distinct categories, where arms in the same category are identically distributed. Let $$\theta_k$$ denote the proportion of the $$k$$th category in the arm set and let $$\Delta_k$$ denote the minimum gap from the $$k$$ category to another, in terms of expected reward. We aim to identify optimal learning rates in the setting where the goal of the learner is to identify a member of \textit{each individual category}. That is, at the end of $$T$$ rounds they aim to output a list of $$K$$ arms containing a distinct member of each category. This is in contrast to the classical problem of best arm identification where the goal of the learner is return a single arm of the optimal category. The regret of the leaner will be given by their probability of error. We describe an algorithm which we conjecture attains the optimal rate, $$\exp\left(-T\min(\theta_k \Delta_k)\right)$$, up to log terms. The proposed algorithm does not require knowledge of the $$\theta_k$$s. Our algorithm does, however, need to know $$\min(\Delta_k)$$, or a lower bound on it, which will then appear in the regret bound.