This graduate course takes students with a basic background in statistics and equips them to tackle massive data sets in health. The focus will be on advanced statistical tests in machine learning and assemble such tests by accessing and validating publicly available code in the R programming language and creating their own code as needed. Students will also learn additional techniques pertaining web scraping, working with unstructured data, data cleaning and data governance building upon the course Data Science in Health I. The course will emphasise creative approaches to analyzing data and how to be critical of misleading analysis. Each class will involve both lecture and weekly tutorial assignments. The major project for the course will involve a large health data set that teams will compete to analyze.