- Speaker: Akshay Krishnamurthy, Carnegie Mellon University, (Speaker’s Website)
- Time: March 10, 2015, 2:00pm-3:00pm
- Location: GDC 5.516
- Host: Inderjit S. Dhillon
In the contextual bandit problem, a learner, for a series of rounds, observes a context, makes an action, and receives reward for this action, but does not see the potential reward for other possible actions. Consequently, the learner must negotiate an exploration-exploitation tradeoff, whereby it explores to understand good actions for each context, but also exploits so that it achieves high reward over the course of the interaction. In contextual bandit problems with large action spaces, existing algorithms fail to achieve competitive statistical performance because exploring spaces is prohibitively expensive.
To address this deficiency, we study a variant of the contextual bandit problem, which we call contextual semi-bandits, where on each round, the learner plays a sequence of actions, observes a feature for each of these actions, and reward that is linearly related to these features. We study both the case where the linear transformation is known and unknown and present algorithms that are statistically and computationally efficient in both cases. These algorithms show how the additional feedback available in this problem enables efficient learning.
Akshay Krishnamurthy is a Ph.D. student in the Computer Science Department at Carnegie Mellon University, advised by Aarti Singh. His research interests are broadly in statistical machine learning and the theoretical underpinnings of machine learning, with a specific focus on interactive learning. He has also worked on problems in unsupervised learning, nonparametric statistics, compressed sensing, and contextual bandit learning. Akshay is an NSF Graduate Research Fellow and also the winner of the Best Student Paper Award at the Asilomar Conference on Signals, Systems and Computers (2013). Prior to joining CMU, he received his undergraduate degree from UC Berkeley in Electrical Engineering and Computer Science.