In healthcare, a tsunami of medical data has emerged. These data can be heterogeneous and noisy, which renders clinical decision-making time-consuming, error-prone, and suboptimal. In this work, we develop machine learning (ML) models and systems for distilling high-value patterns from unstructured clinical data and making informed and real-time medical predictions and recommendations. When developing these models, we encounter several challenges: (1) How to better capture infrequent clinical patterns; (2) How to make the models generalize well on unseen patients? (3) How to promote the interpretability of the decisions? (4) How to improve the timeliness of decision-making without sacrificing its quality? (5) How to efficiently discover massive clinical patterns from large-scale data? To address challenges (1-4), we systematically study diversity-promoting learning, which encourages the components in ML models (1) to diversely spread out to give infrequent patterns a broader coverage, (2) to be imposed with structured constraints for better generalization performance, (3) to be mutually complementary for more compact representation of information, and (4) to be less redundant for better interpretation. To address challenge (5), we study large-scale learning. Specifically, we design efficient distributed ML systems by exploiting a system-algorithm co-design approach. We apply the proposed diversity-promoting learning (DPL) techniques and distributed ML systems to address several critical issues in healthcare. Evaluations on various clinical datasets demonstrate the effectiveness of the DPL methods and efficiency of the systems.