# Data Mining and Machine Learning

### Università di Perugia – Master in Data Science

The course introduces the basic elements of supervised and unsupervised machine learning algorithms. It is subdivided in a set of small theoretical modules, and six practical lab sessions in Python.

The main textbook is The Elements of Statistical Learning, which can be downloaded for free from the authors’ webpage, or purchased through Amazon. At the end of each lecture, you can find the corresponding chapters in the book, together with a short selection of additional reading material. For the deep learning part, held by Prof. Elisa Ricci, students are referred to the recently published Deep Learning book.

## Lectures

**(Click on the lecture title to download the slides)**

Lecture | Arguments | Date |
---|---|---|

Lecture 1 | Introduction to machine learning | 10/03/2017 |

Lecture 2 | Elements of supervised learning [Chapter 2 from the book] |
10/03/2017 |

Lecture 3 | Unconstrained optimization [Check the last slide for introductory materials on optimization] |
11/03/2017 |

Lecture 4 | Linear models for regression and classification [Selection from Chapters 3 and 4 of the book] |
05/05/2017 |

Lecture 5 | Regularization and loss functions | 06/05/2017 |

Lecture 6 | Data preprocessing and model selection | 05/05/2017 |

Lecture 7 | Neural networks [Prof. Elisa Ricci, TBA] | 26-27/05/2017 |

Lecture 8 | Kernel methods | 31/05/2017 |

Lecture 9 | Unsupervised learning [Selection from Chapter 14 of the book] |
01/06/2017 |

Lecture 10 | Ensemble learning | 01/06/2017 |

**Useful reading links:**

- For an informal introduction to Information Theory and the cross-entropy loss, you can check the following blog post: https://colah.github.io/posts/2015-09-Visual-Information/.
- A nice blog post on first-order optimization algorithms: http://sebastianruder.com/optimizing-gradient-descent/.
- A series of high-quality divulgative articles, including one on momentum and one on t-SNE: http://distill.pub/.

## Lab sessions requirements

All lab sessions require a working installation of Python 3.5. If you are starting from scratch, **installing the Anaconda platform is the recommended method** to have a working installation with all prerequisites. Alternatively, Canopy offers a working license by registering with the academic email. If you prefer a minimal installation, after installing Python and the pip module to manage libraries (e.g., following this guide on Ubuntu systems), install all prerequisites by running:

1 |
pip install numpy matplotlib sklearn jupyter |

Further prerequisites are listed for each session. Each of them is organized in one or more interactive Jupyter notebooks. In order to start the Jupyter engine, just run the following code from the Anaconda prompt:

1 |
jupyter notebook |

Remember to run Jupyter from a folder from which you can access the uncompressed files from the lab sessions.

## Lab sessions notebooks

Lab | Arguments | Date |
---|---|---|

Lab session 1 | Download files: Python, NumPy, and two examples of supervised learning with least squares and Contents k-NN. |
11/03/2017 |

Lab session 2 | Download filesContents: scikit-learn, the most common machine learning library in Python. |
05-06/05/2017 |

Lab session 3 | Download filesContents: automatic differentiation with Autograd. |
31/06/2017 |

Lab session 4 | Download filesContents: self-coded K-means and clustering in scikit-learn. |
01/06/2017 |

Lab session 5 | Download filesContents: Spark and MLlib. |
01/06/2017 |

Lab session 6 | Download filesContents: cognitive services and IBM Bluemix. |
01/06/2017 |