Billions of people around the world use Facebook in over one hundred languages. This linguistic diversity is wonderful, but it presents challenges for Natural Language Processing (NLP) systems. It is simply impossible to annotate data and train a new system for each language. Instead, we rely on Cross-Lingual Understanding (XLU) to learn NLP systems in one language and apply them in languages that are not a part of the original training data. In the last two year, self-training methods have enabled significant progress in XLU. I will give a brief overview of self-training methods such as BERT, XLNet and RoBERTa. I will then talk about how we have been able to use self-training advance the state-of-the-art in XLU including the recent cross-lingual language model (XLM) and XLM-R. I will cover recent work that shows that cross-lingual structure emerges naturally in self-trained models such as BERT. I will finish by discussing exciting ongoing work in my group.