The driving force of the fourth industrial revolution is the processing and analysis of big data to extract knowledge, patterns, and information. Chemical, biologics/pharma, oil/gas, financial, and manufacturing organizations are in a unique position to benefit from this data revolution, as they collect and store massive amounts of heterogeneous data. Big data is characterized by the 5 Vs: volume, velocity, variety, veracity, and value and distributed computing architectures are used to process the data. The first part of this course will be on Apache Spark, a big data processing and computing engine. In the second part, special topics in analytics such as visualization, data quality, interpretable/fair ML and MLOps will be discussed. Prerequisites: An introductory course in data science or machine learning (e.g., CHE1147H or other similar courses). Familiarity with Python.