Lecture note “Brief Introduction to Machine Learning without Deep Learning”

This past Spring (2017), I taught the undergrad <Intro to Machine Learning> course. This was not only the first time for me to teach <Intro to Machine Learning> but also the first time for me to teach an undergrad course (!) This course was taught a year before by David Sontag who has now moved to MIT. Obviously, I thought about re-using David’s materials as they were, which you can find at http://cs.nyu.edu/~dsontag/courses/ml16/. These materials are really great, and the coverage of various topics in ML is simply amazing. I highly recommend all the materials on this web page. All the things you need to know in order to become a certified ML scientist can be found there.

I, however, felt that this great coverage may not be appropriate for an undergrad intro course and also that I wasn’t qualified to talk about many of those topics without spending a substantial amount of time studying them myself first. Then, what can/should I do? Yes, I decided to re-create a whole course with two things in my mind. First, what’s the minimal set of ML knowledge necessary for an undergrad to (1) grasp at least the high-level view of machine learning and (2) use ML in practice after they graduate? Second, what are topics in ML that I could teach well without having to pretend I know without knowing them in depth? With these two questions in my mind, as in the previous year for the NLP course, I started to write a lecture note as the semester continued. At the end of the day (or semester), I feel like I’ve taken a step toward a right direction however with much to be improved in the future.

I started with classification. Perceptron and logistic regression were introduced as examples showing the difference between traditional computer science (design an algorithm that solves a problem) and machine learning (design an algorithm that finds an algorithm for solving any given problem). I then moved on to defining (linear) support vector machine as a way to introduce various loss functions and regularization. I gave up on teaching kernel SVM due to time constraint, though. Logistic regression was then generalized to a multi-class logistic regression with softmax.

For teaching how to deal with problems which are not linearly separable, I’ve decided an unorthodox approach. I started with a nearest-neighbour classifier, extend it into a radial basis function network with fixed basis vectors, and then to an adaptive basis function network which I dubbed as deep learning (which is true by the way.) At this point, I think I lost about half of the class, but the other half, I believe, was able to follow the logic based on their performance in the final exam. I should’ve talked about kernel methods here, but well, it’s not like I can use the whole semester solely on classification.

Then, I moved on to regression. Here I focused on introducing probabilistic ML. To do so, I had to spend 2 hours on re-capping on probability itself. I introduced Bayesian linear regression and discussed how it corresponds to linear regression with Gaussian prior on the weight vector. This naturally led to a discussion on how to do Bayesian supervised learning. I wanted to show them Gaussian process regression, but again, there wasn’t enough time.

For unsupervised learning, I again took an unorthodox route by putting (almost) everything under matrix factorization (X=WZ) with a reconstruction cost and varying constraints. PCA and NMF were discussed in-depth under this, and sparse coding and ICA were briefly introduced. k-means clustering was also introduced as a variant of matrix factorization, and hard EM algorithm was (informally) derived from minimizing a reconstruction error with a constraint that the code vectors (Z) were one-hot. This whole matrix factorization was then extended to deep autoencoders and to (metric) multi-dimensional scaling. Surprisingly, students were much more engaged with unsupervised learning than with supervised learning, and at this point, I had regained the half of the class I lost when I was teaching them nonlinear classifiers.

The course ended with the final lecture in which I briefly introduced policy gradient. This was again done in a rather unorthodox way by viewing RL as a sequence of classifiers. I’m quite sure RL researchers would cry over my atrocity here, but well, I thought this was a more intuitive way of introducing RL to a bunch of undergrad students who have highly varying backgrounds. Though, now that I think about it, it may have been better simply to play them the RL intro lecture by Joelle Pineau: http://videolectures.net/deeplearning2016_pineau_reinforcement_learning/.

Anyways, you can find a draft of my lecture note (which will forever be a draft until I retire from the university) at

https://github.com/nyu-dl/Intro_to_ML_Lecture_Note/raw/master/lecture_note.pdf

Any suggestion or PR is welcome at

https://github.com/nyu-dl/Intro_to_ML_Lecture_Note

However, do not expect them to be incorporated quickly, as I’m only planning to revise it next Spring (2018).

During the course, I showed the students the following talks here and there to motivate them (and to give myself some time to breathe):

Hans Rosling. The Best Stats You’ve Ever Seen.
- https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen
- This was for motivating the importance of visualization.
Fei Fei Li. How we’re teaching computers to understand pictures.
- https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures
- This was (incidentally and intentionally) shown on the International Women’s Day.
Interview with Geoff Hinton. The Code That Runs Our Lives
- https://www.youtube.com/watch?v=XG-dwZMc7Ng
- Because I felt guilty not having talked about deep learning enough.
Larry Jackel. Machine Learning and Neural Nets at Bell Labs, Holmdel.
- http://techtalks.tv/talks/machine-learning-and-neural-nets-at-bell-labs-holmdel/63005/
- Great advances in ML happened near NYU!

Related Posts

Drug Discovery may be in the Cold War Era

Global AI Frontier Lab at New York University

Softmax forever, or why I like softmax

Leave a Reply Cancel reply