This course provides advanced discussions on parallel, distributed, and cloud computing and its applications. It will discuss sources of parallelism and locality in scientific applications, common parallel algorithms used in large-scale simulations, and fundamental performance bottlenecks in scientific codes. The students will learn how to work across the stack of parallel algorithm design, mathematical reformulation, and architecture-specific performance tuning to write scalable and fast code. It is intended to be useful for students actively working with applications that benefit from parallel computing.