Machine learning applications are undergoing a fast expansion into every corner of our lives, affecting almost every electronic devices on the way. The future will be filled with a large number of intelligent devices that are much less powerful than traditional computers. Deep neural networks have become the state-of-the-art technique for machine learning tasks. However, these models contain a large number of parameters and are very computationally intensive, which makes it difficult to deploy on edge devices with limited hardware and a tight power budget. To address this problem, there are many researchers propose various techniques to improve the efficiency of existing algorithms. In this talk, I will introduce and review several major techniques (model compression, inference, and training speedup) developed in the past few years.