Data Mining and Machine Learning

Università di Perugia – Master in Data Science

The course introduces the basic elements of supervised and unsupervised machine learning algorithms. It is subdivided in a set of small theoretical modules, and six practical lab sessions in Python.

The main textbook is The Elements of Statistical Learning, which can be downloaded for free from the authors’ webpage, or purchased through Amazon. At the end of each lecture, you can find the corresponding chapters in the book, together with a short selection of additional reading material. For the deep learning part, held by Prof. Elisa Ricci, students are referred to the recently published Deep Learning book.


(Click on the lecture title to download the slides)

Lecture Arguments Date
Lecture 1 Introduction to machine learning 10/03/2017
Lecture 2 Elements of supervised learning
[Chapter 2 from the book]
Lecture 3 Unconstrained optimization
[Check the last slide for introductory materials on optimization]
Lecture 4 Linear models for regression and classification
[Selection from Chapters 3 and 4 of the book]
Lecture 5 Regularization and loss functions 06/05/2017
Lecture 6 Data preprocessing and model selection 05/05/2017
Lecture 7 Neural networks [Prof. Elisa Ricci, TBA] 26-27/05/2017
Lecture 8 Kernel methods 31/05/2017
Lecture 9 Unsupervised learning
[Selection from Chapter 14 of the book]
Lecture 10 Ensemble learning 01/06/2017

Useful reading links:

  1. For an informal introduction to Information Theory and the cross-entropy loss, you can check the following blog post:
  2. A nice blog post on first-order optimization algorithms:
  3. A series of high-quality divulgative articles, including one on momentum and one on t-SNE:

Lab sessions requirements

All lab sessions require a working installation of Python 3.5. If you are starting from scratch, installing the Anaconda platform is the recommended method to have a working installation with all prerequisites. Alternatively, Canopy offers a working license by registering with the academic email. If you prefer a minimal installation, after installing Python and the pip module to manage libraries (e.g., following this guide on Ubuntu systems), install all prerequisites by running:

Further prerequisites are listed for each session. Each of them is organized in one or more interactive Jupyter notebooks. In order to start the Jupyter engine, just run the following code from the Anaconda prompt:

Remember to run Jupyter from a folder from which you can access the uncompressed files from the lab sessions.

Lab sessions notebooks

Lab Arguments Date
Lab session 1 Download files
: Python, NumPy, and two examples of supervised learning with least squares and k-NN.
Lab session 2 Download files
Contents: scikit-learn, the most common machine learning library in Python.
Lab session 3 Download files
Contents: automatic differentiation with Autograd.
Lab session 4 Download files
Contents: self-coded K-means and clustering in scikit-learn.
Lab session 5 Download files
Contents: Spark and MLlib.
Lab session 6 Download files
Contents: cognitive services and IBM Bluemix.