The contextual bandit is a sequential decision making problem in which a learner repeatedly selects an action (e.g., a news article to display) in response to a context (e.g., a user’s profile) and receives a reward, but only for the action they selected. Beyond the classic explore-exploit tradeoff, a fundamental challenge in contextual bandits is to develop algorithms that can leverage flexible function approximation to model similarity between contexts, yet have computational requirements comparable to classical supervised learning tasks such as classification and regression. To this end, we provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any algorithm for regression with a given function class into an algorithm for contextual bandits, with no overhead in runtime or memory requirements. Conceptually, our results show that it is possible to provably separate estimation and decision making into separate algorithmic building blocks, and that this can be effective both in theory and in practice. Time permitting, I will discuss extensions of these techniques to more challenging reinforcement learning problems.