This page is for the new (2018) course on Machine Learning.
Class Notices:
The class outline is here as a PDF file, and here in HTML.
A brief list of the topics we will treat is as follows.
Method of Evaluation:
We decided that the most useful way of evaluating our abilities with machine learning would be to get hold of a suitable data set and train an algorithm for some appropriate task. It was also decided that, although individual students are very welcome to carry out a project alone, it is also quite in order for students to work in pairs, and submit one single project for the two participants.
Resources:
We will follow the textbook Deep Learning, by Goodfellow, Bengio, and Courville. I do not expect to have time to cover any more than the first two main sections of the book. The book is available online at this site. It is also available as a hardback physical book.
Here are other resources. I spoke of one or two of them during the first class. See the course outline for more information about them.
More resources
The resources and links given here are things I found or was told about recently, and are not in the course outline.
Data and Python files
Although it is not hard to download the MNIST data directly from within Python, it may be more convenient to get the data from the MNIST_data directory. The names of the files are mostly self-explanatory, but those names beginning t10k contain the test data.
This Python file contains code for loading the files in the MNIST_data directory, and then performing some of the tasks we spoke about during the second class, taken from Chapter 3 of Géron's "Hands-on" book. In order to run it successfully, you will need to install a number of Python libraries. If you are still using Python2 (not recommended), the tool for installing libraries is pip; if you have moved on to Python3, it is called pip3.
Then, from a terminal, issue the command
pip3 install --upgrade jupyter matplotlib numpy pandas scipy scikit-learn
If you like, you can add tensorflow to the list of modules to install. We will want it later.
It is advisable, although it is not necessary, to do all of the above in a virtual environment, in order not to mess up anything else you may be doing with Python on your computer. First install another package:
pip3 install --user --upgrade virtualenv
Then, create a directory (or folder, as some call it) where you wish to work, and, from within this directory, issue the command
virtualenv -p python3 venv
You can alter venv to any other name you like. Then, from the same place where you created the virtual environment, do
source venv/bin/activate
You are now in your virtual environment, and can proceed to install the various packages mentioned above. Once you are finished, you can just type deactivate, and you have left the virtual environment.
Log of material covered
On September 5, we covered Section 5.1 of the main textbook, plus the beginning of Section 5.2. We also talked a good deal about computing, especially Python, and a short mention of R.
On September 12, we went over of Section 5.2, but not subsections 5.2.1 and 5.2.2 Then we did Section 5.3. In Section 5.4, we looked at the algorithm for k-fold cross validation. Unless people ask for it, I'll leave out the rest of Section 5.4, because I think most of you know that material quite well. We also did Section 3.1 on "Why Probability?", then looked at the logistic sigmoid and the softplus function in Section 3.10, and made a start on Section 3.13 on Information Theory, entropy, and the Kullback-Leibler divergence.
In the computing part, we looked at some of Chapter 3 in the "Hands-on" book, and this led to some discussion of the MNIST data set, and an SGD (Stochastic Gradient Descent) classifier.
On September 19, we finished looking at Section 3.13, and saw the connection between entropy, particularly cross-entropy, and maximum likelihood estimation, treated in Section 5.5. Then came Section 5.6, on Bayesian statistics. In Section 5.7, on supervised learning algorithms, we looked at the logit model, and support vector machines, with mention of the "kernel trick".
We had trouble connecting my computer to the display on the screen, and so made only slight progress with how to use the Python libraries. However, we did complete the material of interest to us in Chapter 3 of the "Hands-on" book.
On September 26, we began with two techniques of unsupervised learning, namely Principal Components Analysis (PCA) and k-means clustering; this in section 5.8 of the textbook. There was a brief mention of Newton's method, as an alternative to gradient descent, much used in econometrics, but little used in machine learning (section 4.3.1). We then rapidly finished what remained of Chapter 5. noting that two essential ingredients of a learning algorithm are a cost function and a model.
Then we started on Part II, on deep learning. After defining an artificial neural network (ANN), also called a multi-layer perceptron, we saw how a single-layer, linear, network cannot learn the XOR relation. Then, following section 6.1, we demonstrated that a network with one hidden layer and a nonlinear activation function (a rectified linear unit (ReLU)) can reproduce XOR. This network was not learned; just presented so as to show feasibility.
In the "Hands-on" book, we started on Chapter 9, which deals with tensorflow. It operates in two phases, the construction phase, in which the model is set up, and the execution phase, in which the actual training of the model takes place.
On October 3, we covered a good deal of material in Chapter 6 on the textbook, on feedforward networks and back-propagation. Starting at section 6.2, we covered most of what follows through the end of section 6.5. The final section of this chapter is entitled "Historical Notes", and is left for people to read on their own. An important topic is computational graphs.
Then we continued our investigation of tensorflow, looking at the many ways it can perform a linear regression. These illustrate a lot of what we were looking at in the main textbook.
On October 10, we started on Chapter 7 of the textbook, on regularisation. We managed to cover only section 7.1, on parameter norm penalties.
In the "Hands-on" book, we went through Chapter 10, on how to use tensorflow with multi-layer perceptrons, that is, for deep learning. This chapter also provides good revision material for what we studied on Chapter 6 of the main textbook.
On October 17, we completed all of Chapter 7, except for section 7.12 on dropout.
We embarked on Chapter 11 of the "Hands-on" book, and saw how to use various forms of regularisation in tensorflow with a deep network. In particular, we studied batch normalisation as a good way to avoid the vanishing-gradients and the exploding-gradients problems.
On October 24, after studying dropout as a sort of data-augmentation method, akin is some ways to bagging, we moved on to Chapter 8, on optimisation, and went all the way through section 8.5
It turns out that the "Hands-on" book covers a lot of the material we had looked at in the main textbook, sometimes a lot more clearly in the definition of the various algorithms, momentum, RMSprop, etc. We were able to finish Chapter 11, and then moved back to Chapter 6, on Decision Trees, where we saw a very simple example, used for classifying the Iris dataset.
On October 31 (Halloween!) we finished Chapter 8 on optimisation, and covered the material in Chapter 9 (convolutional networks) through section 9.5. in the "Hands-on" book, we finished Chapter 6 on Decision Trees, and then went on to Chapter 7, on ensemble methods, of which the most important is the Random Forest. We looked briefly again at bagging, another important ensemble method.
On November 7, we looked only briefly at the end of Chapter 9 on convolutional networks. We then moved on to Chapter 10, on recurrent networks. We saw many analogies with time-series analysis. We spent some time demystifying the explicit example in section 10.2, in particular working out the gradient of a cross-entropy loss function with respect to the parameters of the model. We ended up in the middle of subsection 10.2.4.
In the "Hands-on" book, we jumped to Chapter 13, on convolutional networks, and covered most of the material in that chapter. What we left out was the actual detailed descriptions of some of the networks that won prizes in various competitions over the past few years.
On November 14, we finished Chapter 10, looking at gated recurrent networks, especially the Long Short-Term Memory (LSTM) cell, and the Gated Recurrent Unit (GRU).
In the "Hands-on" book, we looked briefly at some of the convolutional networks that had won prizes, described at the end of Chapter 13. Then we embarked on Chapter 14, on recurrent neural nets (RNNs). This chapter contains a rather better graphical depiction of the gated networks than does the main textbook.
On November 21, we finished Chapter 11 on Practical Methodology and then, in Chapter 12, Applications, we looked at sections 12.1-12.3, on hardware considerations, computer vision, and speech recognition.
The final section in Chapter 14 of Géron's book deals with natural language processing, with particular emphasis on machine translation. We studied this, as a preliminary to section 12.4 in the main textbook, also on natural language processing.
November 28 was the last day for this course. We finished Chapter 12, looking at Natural Language Processing (NLP) and machine translation, recommender systems, bandits, and knowledge representation. Then, in the "Hands-on" book, we studied the chapter on autoencoders and as much of the following chapter, on Reinforcement Learning (RL), as there was time for.
It has been lots of fun for me to teach this class, and I have certainly learned a lot about machine learning. I hope that the same is true of you.
To send me email, click here or write directly to
Russell.Davidson@mcgill.ca.
URL: https://russell-davidson.arts.mcgill.ca/e706