This course introduces core statistical learning ideas through a computational framework based on (stochastic) gradient descent methods. Students will build on core probability and convergence concepts to learn how common modeling choices translate into practical estimation and prediction procedures. A range of supervised models (linear, generalized linear, and their flexible extensions) will be presented using a (stochastic) gradient descent approach with different losses (squared, logistic, quantile) and regularization.
We also introduce basic decision theory to connect predicted probabilities to actions, such as optimal classification thresholds. Model evaluation and model selection methods will be emphasized, as will careful comparison of models using appropriate performance metrics and diagnostic tools.
0.50
Introduction to mathematical probability theory (STA257 or equivalent), including: probability spaces, common probability distributions, discrete and continuous random variables, distribution and density functions, joint distributions, expected values, generating functions. Advanced calculus (MAT237 or equivalent) and linear algebra (MAT223, 224, or equivalent).