One challenge in large scale data science is that even linear algorithms can result in large data processing cost and long latency, which limit the interactivity of the system and the productivity of data scientists. It is a dream to enable data science with sublinear complexity, such that the cost grows slowly or independently with the data size. In this talk, I will present approximation methods for several time-consuming tasks in visual analytics and machine learning. Our methods are based on data sampling, which are very easy to implement. The results have probabilistic error bounds in theory and hold up to the premise in practice. Compared to the exact solutions, the approximate solutions have 10-1000 speedup within small error bound in typical workloads – and the performance gain amplifies as the data size grows.