Uncertainty sampling is perhaps the most common active learning
algorithm used in practice. In this talk, we describe two phenomena
about uncertainty sampling and the corresponding theoretical
explanations that we developed to explain these phenomena. First,
uncertainty sampling on a convex loss can converge to a wide variety
of final error rates, and sometimes even lower than the error of
random sampling, given infinite samples for a given dataset. We
explain this phenomenon by finding that uncertainty sampling is
implicitly optimizing the nonconvex zero-one loss. Second, uncertainty
sampling with logistic regression on different datasets yields a wide
range of data efficiencies (the factor reduction in the number of
samples to reach the same error). We find both empirically and
theoretically that this variation in data efficiency is inversely
related to the error rate of the final classifier.