Topic outline

  • Announcements

    • Full lecture notes plus the lab material for the module are given here. The material is broken down in weeks. We may make *minor* amendments to the material after each week.

    • This file contains all the handwritten notes of the lectures and it will be updated every week.  The current version contains all the material up to Week 12. It is complete, that is.

    • This R-file contains all the coding demonstrations that are performed during the lectures and it will be updated every week.  The current version contains all the material seen in the Module lectures, i.e. it is complete.

  • Module Description

    Can machines do what we can do? Machine Learning is a rapidly growing subfield of Artificial Intelligence, at the boundary between Statistics and Computer Science, focusing on teaching machines how to learn by developing their own algorithms without the need for human supervision. The module has theoretical and practical aspects. By the end of it you will have a good understanding of the theoretical basis for machine learning and have gained some hands-on experience in the lab. You will have worked with algorithms such as decision tree learning and classification methods, and statistical methods for analysing big datasets.

  • Syllabus

    0 Preliminary topics for Machine Learning

    1 Unsupervised learning: Principal Components Analysis

    2 Unsupervised learning: Cluster analysis

    3 Supervised learning: Classification

    4 Supervised learning: Regression and regularization

  • Module aims and learning outcomes

      • ACADEMIC CONTENT

        This module covers:

        •  Principal component analysis
        •  Clustering data
        •  Classification methods
        •  Penalised likelihood

        DISCIPLINARY SKILLS

        At the end of this module, students should be able to:

        • Describe a general outline of Machine learning and distinguish between supervised and unsupervised learning and identify examples of each type.
        • Understand and explain the Karhunen-Loeve and singular value decompositions of matrices, describe their link and be able to explain small examples step  by step.
        • For small data sets, derive by hand the results of agglomerative clustering using different distances and linkages.
        • Understand and explain different methodologies available for classification and regularization.
        • For each of the four topics covered in the Module, carry out analysis of datasets using R and interpret numerical output.

  • Teaching team

  • Assessment information

    • This Module will be evaluated with a final online exam worth 70%. This exam is to take place during the exam period. The exam will take form of a series of activities and questionnaires, commonly known as a "QMplus quiz". 

      In addition, there will be two courseworks, each worth 15% to total the remaining 30%. These courseworks will be in weeks 7 and 12, and they will also have the form on an online "QMplus quiz".

      Note that if at the end of term you have Extenuating Circumstances (EC) for both courseworks, you will have to submit an extra coursework.

  • Final Assessment - 15th May 2024

    • This handwritten assessment is available for a period of 3 hours and 30 minutes, within which you must submit your solutions. You may log out and in again during that time, but the countdown timer will not stop. If your attempt is still in progress at the end of your 3 and a half hours, any file you have uploaded will be automatically submitted.

      The assessment is intended to be completed within 3 hours. Please note that the additional 30 minutes is to scan and submit your answers. Please ensure that you complete the assessment within 3 hours to prevent any technical issues that may occur if you submit close to the deadline.

      In completing this assessment:

      • You may use books and notes.

      • You may use calculators and computers, but you must show your working

      for any calculations you do.

      • You may use the Internet as a resource, but not to ask for the solution to

      an exam question or to copy any solution you find.

      • You must not seek or obtain help from anyone else.

  • Week 1 : Overview and Revision

    • Welcome to the first week! 

      This week we will provide an overview of the content of the module (timetable, syllabus, assessment, IT classes). We will also do a short revision on basic elements of linear algebra, mainly matrix eigenvalues and eigenvectors, as well as some essential elements from calculus. 
      Prior to this weeks lectures, make sure you read the booklet material of the first week: eigenvalues and eigenvectors of a matrix, Karhunen-Loeve decomposition and singular value decomposition of a matrix.

      Please also have a look already at the instructions outlined in Week 2. The labs (only for that week!) will take place on Monday 29.01.24 so you should familiarise yourself with the R Programming language (RStudio).

  • Week 2 : First IT labs, wrapping up revision and starting with PCA

    • Welcome to Week 2!

      Remember, you can find Kostas on Tuesday 12:00-13:00 at the Learning Cafe (School Social Hub (Room MB-B11). Come along to ask anything regarding the module (and for tea, coffee and pastries).

      In other news,  IT labs will start this week! 

      IT labs and analyses are an essential part of the Module as you cannot really talk about Machine Learning if you do not talk about data, code and analyses. 

      The labs are run in R. You can download R from https://cran.r-project.org/ 

      A popular version with interactive menus is RStudio, that can be downloaded from https://rstudio.com/products/rstudio/download/ 

      If you want to run R online, you can use the following link https://rdrr.io/snippets/

      The intention during the lab is to gently run R code and make comments upon it. We have uploaded the problems (see main booklet) where we also include a more extensive, commented version of the code.

      In each week, it is your responsibility to run the codeunderstand what is going on and being able to replicate and explain analyses.


  • Week 3 : PCA and its interpretation

    • Welcome to week 3!

      This week, we will be analysing in more depth the PCA method and we will explain how to interpret its output. Don't miss the Friday labs, as there we will see how PCA is used to study a real-world dataset.

  • Week 4 : PCA and its interpretation

    • Welcome to week 4! 

      This week we will be wrapping up PCA and briefly start with the next topic of the module: Clustering. Again make sure that you don't miss the Friday labs where we will see a more extensive application of PCA to a real-world dataset.

  • Week 5 : Agglomerative clustering

    Welcome to week 5!

    This week we will be focusing on Agglomerative Clustering, one of the most basic and useful methods to cluster data based on their degree of similarity.

    In addition, our school is doing NSS promotion in our Thursday lectures. They will show up with tasty donuts ! Do not miss out your chance to reflect your voice about your education.

  • Week 6 : K-means clustering

    Welcome to week 6!

    This week we will examine the clustering method of k-means in more detail. We will also give some general guidelines for the first mid-term assessment.

  • Week 7: Mid-term Quiz (15% of final mark)

    • The mid-term online quiz will take place on Friday Week 7 at the same time with the labs (i.e. no labs on week 7!):
      Friday March 8, 2024, 16:00-18:00
    • We will soon make an announcement regarding where you will find the quiz on the module's QMplus page.
    • We will have a question session on Wednesday March 6, 15:00-17:00 via this teams channel (password: akzwxmy). 
    • Information about the date, the format, the available help and the essential tasks you are expected to do for the first mid-term assessment.

    • Here are some sample questions that will help you during your revision for the first mid-term quiz. 

      Update March 5:  We have now added an additional clustering question on the sample questions.

      Update March 6: The file now contains the solutions

    • This quiz assessment is available for a period of 2 hours. Upon accessing the assessment, you will have until the end of the quiz (Friday 08/03/24 at 18:00, local London time) in which to complete and submit it. You may log out and in again during that time, but the countdown timer will not stop. If your attempt is still in progress at the end of the 2-hour period, any answers you have filled will be automatically submitted.

      In completing this assessment:
      • You may use books and notes.
      • You may use calculators and computers, but you must show your working for any calculations you do.
      • You may use the Internet as a resource, but not to ask for the solution to an exam question or to copy any solution you find.
      • You must not seek or obtain help from anyone else.

  • Week 8

    This week we look at classification, putting emphasis on performance measures to compare classifiers. In particular, we look at the confusion matrix, the ROC graph and the ROC curve.

  • Week 9

    In lectures, this week we survey classifiers, covering the linear model, logistic regression, the k-nearest neighbor classifier and start looking at the classification tree.

    In the lab, we will analyze the glass data set. Make sure you download the data set (link below) and that there is a specific change in the code as give in the booklet. The change is

    DAT<-data.frame(X,factor(Y));    colnames(DAT)[11]<-"Y"

    instead of the line DAT<-data.frame(X,Y) in the booklet. The reason for this is that the tree classifier requires  the response defined as a factor.

  • Week 10

    Our survey of classifiers comes to a close with the second part of trees and the linear discriminant classifier. We then start the last topic of this Module: penalized regression.  All these classifiers were seen in the lab of week 9, though.

    Note that Friday week 10 there will be no labs because of the Easter Bank holiday.

    • As promised, the shiny app. Just run the app in Rstudio and play with it to see the evolution of ridge as function of lambda. You need not edit the code but you may do so if you wish. The code is not particularly complex, it is just the functions for the ridge put inside a wrapper as required by shiny.

  • Week 11

    We continue ridge regression and then turn our attention to the lasso and  finally, elastic nets.

    Note that the lab for this week contains analyses and comparisons for all three models: ridge, lasso and elastic net. This lab is already available in the booklet.

  • Week 12: Revision and end-term Quiz (15 % of final mark)

    • The second mid-term online quiz will take place on Friday Week 12 at the same time with the labs (i.e. no labs on week 12!):
      Friday April 12, 2024, 16:00-18:00
    • We will soon make an announcement regarding where you will find the quiz on the module's QMplus page.

    Concerning Module material, this week we continue our study of regularized regression. We finalize lasso and then survey elastic nets for both regression and for likelihood-based models. This last week bring this Module to a close.

    • This quiz assessment is available for a period of 2 hours. Upon accessing the assessment, you will have until the end of the quiz (Friday 12/04/24 at 18:00, local London time) in which to complete and submit it. You may log out and in again during that time, but the countdown timer will not stop. If your attempt is still in progress at the end of the 2-hour period, any answers you have filled will be automatically submitted.

      In completing this assessment:
      • You may use books and notes.
      • You may use calculators and computers, but you must show your working for any calculations you do.
      • You may use the Internet as a resource, but not to ask for the solution to an exam question or to copy any solution you find.
      • You must not seek or obtain help from anyone else.

    • You asked for it ... here is a sample for the midterm test. We'll make solutions available *no earlier* than Wednesday morning/noon.

      Update (Wednesday 10 afternoon): the file has solutions as well.

    • Small data set used for the elastic net example in the notes.

    • Here is the material used for the revision session in the last lecture. Note -as I said in the revision lecture- that this is a compilation of the weekly summary boxes of the booklet.

    • The results of the test are now available; here is a report of the results.

  • Hints and tips

    • Add information here.

  • Where to get help

    • Add information here.

  • General course materials

    • Add information here.

  • Coursework

    • The first mid-term quiz can be found under the "Module Content"->"Week 7" tab.

  • Exam papers

    • Our recommendation is that past exam should be attempted a) only after exhaustive revision and b) under true exam conditions (no talking with others, do all the questions and in the allocated time).
    • This file has blank exam and solutions for each of exam 2020 and sample 2020.

    • Some other exams.

  • Q-Review

  • Online Reading List