bandit-notes of combinatorial bandits 2011

I want to write my thoughts of the paper [Combinatorial Bandits] by Nicolo Cesa-Bianchi and Gabor Lugosi in 2011. The first author is a great professor in this area. His paper of [Finite-time analysis of the multi-armed bandit problem] in 2002 as second author has been cited over 1,500 times. The survey which I am reading about bandit problem, [Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems], is written by him as second author.

Before I read this paper, I thought linear bandits is just a small variant of normal bandit problem and combinatorial bandit problem is a higher-level generalization of bandit problem. But combinatorial bandit problem is indeed a special case of linear bandits. To see this, represent one super-arm by a 0-1 sequence of length K, using 1 to denote base arm belong to super-arm and 0 otherwise.

In linear bandit problem setting, the set of bandits should be a compact set of euclidean space R^d. At each time t, the adversarial gives a loss vector l_t from R^d and the loss revealed to the forecaster is the inner product of played arm, which is a vector of R^d, and loss vector l_t. For combinatorial bandit problem, super arm is a combination of some base arms and the loss of one super arm is the sum of all losses of base arms belonging to it. So if we use 0-1 sequence to denote one super arm and represent loss vector by all losses of base arms, then the loss of super arm, the sum of base arms belonging to it, is the inner product of 0-1 sequence and loss vectors. That's why combinatorial is one special case of linear bandits.

In this paper, it introduces an algorithm of ComBand and provides a lot of good applications using it.

bandit-notes of combinatorial bandits 2011_第1张图片

In the step 4, it use probability p_t to construct a matrix P where probability is a combination of q, depending on the loss, and a random distribution \mu on S, set of all super arms. Probability q is used for exploitation and \mu is for exploration. So probability is responsible for trade-off between exploration and exploitation.

Then the author use it to solve multitask bandit problem, hypercube where S ={0,1}^d is all possible combinations of base arms, graph problems with "symmetric" S which includes perfect matchings/spanning trees/cut sets/hamiltonian cycles/stars, m-sized subsets where S is all combinations of m base arms and path planning where S denotes all possible paths.

I think this paper is great not only because it provides a great applications which has wide influence in reality, especially those in graph problems. He has a solid background in computer science which reflects in translating those problems in terms of combinatorial bandit language and also the detailed solving methods. 

你可能感兴趣的:(bandit-notes of combinatorial bandits 2011)