Data Mining and Machine Learning

Università di Perugia – Master in Data Science

The course introduces the basic elements of supervised and unsupervised machine learning algorithms. It is subdivided in a set of small theoretical modules, and six practical lab sessions in Python.

The main textbook is The Elements of Statistical Learning, which can be downloaded for free from the authors’ webpage, or purchased through Amazon. At the end of each lecture, you can find the corresponding chapters in the book, together with a short selection of additional reading material. For the deep learning part, held by Prof. Elisa Ricci, students are referred to the recently published Deep Learning book.

Lectures

(Click on the lecture title to download the slides)

Lecture Arguments Date
Lecture 1 Introduction to machine learning 10/03/2017
Lecture 2 Elements of supervised learning
[Chapter 2 from the book]
10/03/2017
Lecture 3 Unconstrained optimization
[Check the last slide for introductory materials on optimization]
11/03/2017
Lecture 4 Linear models for regression and classification
[Selection from Chapters 3 and 4 of the book]
05/05/2017
Lecture 5 Regularization and loss functions 06/05/2017
Lecture 6 Data preprocessing and model selection 05/05/2017
Lecture 7 Neural networks [Prof. Elisa Ricci, TBA] 26-27/05/2017
Lecture 8 Kernel methods 31/05/2017
Lecture 9 Unsupervised learning
[Selection from Chapter 14 of the book]
01/06/2017
Lecture 10 Ensemble learning 01/06/2017

Useful reading links:

  1. For an informal introduction to Information Theory and the cross-entropy loss, you can check the following blog post: https://colah.github.io/posts/2015-09-Visual-Information/.
  2. A nice blog post on first-order optimization algorithms: http://sebastianruder.com/optimizing-gradient-descent/.
  3. A series of high-quality divulgative articles, including one on momentum and one on t-SNE: http://distill.pub/.

Lab sessions requirements

All lab sessions require a working installation of Python 3.5. If you are starting from scratch, installing the Anaconda platform is the recommended method to have a working installation with all prerequisites. Alternatively, Canopy offers a working license by registering with the academic email. If you prefer a minimal installation, after installing Python and the pip module to manage libraries (e.g., following this guide on Ubuntu systems), install all prerequisites by running:

Further prerequisites are listed for each session. Each of them is organized in one or more interactive Jupyter notebooks. In order to start the Jupyter engine, just run the following code from the Anaconda prompt:

Remember to run Jupyter from a folder from which you can access the uncompressed files from the lab sessions.

Lab sessions notebooks

Lab Arguments Date
Lab session 1 Download files
Contents
: Python, NumPy, and two examples of supervised learning with least squares and k-NN.
11/03/2017
Lab session 2 Download files
Contents: scikit-learn, the most common machine learning library in Python.
05-06/05/2017
Lab session 3 Download files
Contents: automatic differentiation with Autograd.
31/06/2017
Lab session 4 Download files
Contents: self-coded K-means and clustering in scikit-learn.
01/06/2017
Lab session 5 Download files
Contents: Spark and MLlib.
01/06/2017
Lab session 6 Download files
Contents: cognitive services and IBM Bluemix.
01/06/2017