Uncertainty sampling is perhaps the most common active learning algorithm used in practice. In this talk, we describe two phenomena about uncertainty sampling and the corresponding theoretical explanations that we developed to explain these phenomena. First, uncertainty sampling on a convex loss can converge to a wide variety of final error rates, and sometimes even lower than the error of random sampling, given infinite samples for a given dataset. We explain this phenomenon by finding that uncertainty sampling is implicitly optimizing the nonconvex zero-one loss. Second, uncertainty sampling with logistic regression on different datasets yields a wide range of data efficiencies (the factor reduction in the number of samples to reach the same error). We find both empirically and theoretically that this variation in data efficiency is inversely related to the error rate of the final classifier.