À propos de ce cours
38,564 consultations récentes

100 % en ligne

Commencez dès maintenant et apprenez aux horaires qui vous conviennent.

Dates limites flexibles

Réinitialisez les dates limites selon votre disponibilité.

Approx. 48 heures pour terminer

Recommandé : 6 weeks of study, 5-8 hours/week...


Sous-titres : Anglais, Coréen, Arabe

Compétences que vous acquerrez

Data Clustering AlgorithmsK-Means ClusteringMachine LearningK-D Tree

100 % en ligne

Commencez dès maintenant et apprenez aux horaires qui vous conviennent.

Dates limites flexibles

Réinitialisez les dates limites selon votre disponibilité.

Approx. 48 heures pour terminer

Recommandé : 6 weeks of study, 5-8 hours/week...


Sous-titres : Anglais, Coréen, Arabe

Programme du cours : ce que vous apprendrez dans ce cours

1 heure pour terminer


Clustering and retrieval are some of the most high-impact machine learning tools out there. Retrieval is used in almost every applications and device we interact with, like in providing a set of products related to one a shopper is currently considering, or a list of people you might want to connect with on a social media platform. Clustering can be used to aid retrieval, but is a more broadly useful tool for automatically discovering structure in data, like uncovering groups of similar patients.<p>This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have.

4 vidéos (Total 25 min), 4 lectures
4 vidéos
Course overview3 min
Module-by-module topics covered8 min
Assumed background6 min
4 lectures
Important Update regarding the Machine Learning Specialization10 min
Slides presented in this module10 min
Software tools you'll need for this course10 min
A big week ahead!10 min
4 heures pour terminer

Nearest Neighbor Search

We start the course by considering a retrieval task of fetching a document similar to one someone is currently reading. We cast this problem as one of nearest neighbor search, which is a concept we have seen in the Foundations and Regression courses. However, here, you will take a deep dive into two critical components of the algorithms: the data representation and metric for measuring similarity between pairs of datapoints. You will examine the computational burden of the naive nearest neighbor search algorithm, and instead implement scalable alternatives using KD-trees for handling large datasets and locality sensitive hashing (LSH) for providing approximate nearest neighbors, even in high-dimensional spaces. You will explore all of these ideas on a Wikipedia dataset, comparing and contrasting the impact of the various choices you can make on the nearest neighbor results produced.

22 vidéos (Total 137 min), 4 lectures, 5 quiz
22 vidéos
1-NN algorithm2 min
k-NN algorithm6 min
Document representation5 min
Distance metrics: Euclidean and scaled Euclidean6 min
Writing (scaled) Euclidean distance using (weighted) inner products4 min
Distance metrics: Cosine similarity9 min
To normalize or not and other distance considerations6 min
Complexity of brute force search1 min
KD-tree representation9 min
NN search with KD-trees7 min
Complexity of NN search with KD-trees5 min
Visualizing scaling behavior of KD-trees4 min
Approximate k-NN search using KD-trees7 min
Limitations of KD-trees3 min
LSH as an alternative to KD-trees4 min
Using random lines to partition points5 min
Defining more bins3 min
Searching neighboring bins8 min
LSH in higher dimensions4 min
(OPTIONAL) Improving efficiency through multiple tables22 min
A brief recap2 min
4 lectures
Slides presented in this module10 min
Choosing features and metrics for nearest neighbor search10 min
(OPTIONAL) A worked-out example for KD-trees10 min
Implementing Locality Sensitive Hashing from scratch10 min
5 exercices pour s'entraîner
Representations and metrics12 min
Choosing features and metrics for nearest neighbor search10 min
KD-trees10 min
Locality Sensitive Hashing10 min
Implementing Locality Sensitive Hashing from scratch10 min
2 heures pour terminer

Clustering with k-means

In clustering, our goal is to group the datapoints in our dataset into disjoint sets. Motivated by our document analysis case study, you will use clustering to discover thematic groups of articles by "topic". These topics are not provided in this unsupervised learning task; rather, the idea is to output such cluster labels that can be post-facto associated with known topics like "Science", "World News", etc. Even without such post-facto labels, you will examine how the clustering output can provide insights into the relationships between datapoints in the dataset. The first clustering algorithm you will implement is k-means, which is the most widely used clustering algorithm out there. To scale up k-means, you will learn about the general MapReduce framework for parallelizing and distributing computations, and then how the iterates of k-means can utilize this framework. You will show that k-means can provide an interpretable grouping of Wikipedia articles when appropriately tuned.

13 vidéos (Total 79 min), 2 lectures, 3 quiz
13 vidéos
An unsupervised task6 min
Hope for unsupervised learning, and some challenge cases4 min
The k-means algorithm7 min
k-means as coordinate descent6 min
Smart initialization via k-means++4 min
Assessing the quality and choosing the number of clusters9 min
Motivating MapReduce8 min
The general MapReduce abstraction5 min
MapReduce execution overview and combiners6 min
MapReduce for k-means7 min
Other applications of clustering7 min
A brief recap1 min
2 lectures
Slides presented in this module10 min
Clustering text data with k-means10 min
3 exercices pour s'entraîner
k-means18 min
Clustering text data with K-means16 min
MapReduce for k-means10 min
3 heures pour terminer

Mixture Models

In k-means, observations are each hard-assigned to a single cluster, and these assignments are based just on the cluster centers, rather than also incorporating shape information. In our second module on clustering, you will perform probabilistic model-based clustering that provides (1) a more descriptive notion of a "cluster" and (2) accounts for uncertainty in assignments of datapoints to clusters via "soft assignments". You will explore and implement a broadly useful algorithm called expectation maximization (EM) for inferring these soft assignments, as well as the model parameters. To gain intuition, you will first consider a visually appealing image clustering task. You will then cluster Wikipedia articles, handling the high-dimensionality of the tf-idf document representation considered.

15 vidéos (Total 91 min), 4 lectures, 3 quiz
15 vidéos
Aggregating over unknown classes in an image dataset6 min
Univariate Gaussian distributions2 min
Bivariate and multivariate Gaussians7 min
Mixture of Gaussians6 min
Interpreting the mixture of Gaussian terms5 min
Scaling mixtures of Gaussians for document clustering5 min
Computing soft assignments from known cluster parameters7 min
(OPTIONAL) Responsibilities as Bayes' rule5 min
Estimating cluster parameters from known cluster assignments6 min
Estimating cluster parameters from soft assignments8 min
EM iterates in equations and pictures6 min
Convergence, initialization, and overfitting of EM9 min
Relationship to k-means3 min
A brief recap1 min
4 lectures
Slides presented in this module10 min
(OPTIONAL) A worked-out example for EM10 min
Implementing EM for Gaussian mixtures10 min
Clustering text data with Gaussian mixtures10 min
3 exercices pour s'entraîner
EM for Gaussian mixtures18 min
Implementing EM for Gaussian mixtures12 min
Clustering text data with Gaussian mixtures8 min
290 avisChevron Right


a commencé une nouvelle carrière après avoir terminé ces cours


a bénéficié d'un avantage concret dans sa carrière grâce à ce cours

Principaux examens pour Machine Learning: Clustering & Retrieval

par BKAug 25th 2016

excellent material! It would be nice, however, to mention some reading material, books or articles, for those interested in the details and the theories behind the concepts presented in the course.

par JMJan 17th 2017

Excellent course, well thought out lectures and problem sets. The programming assignments offer an appropriate amount of guidance that allows the students to work through the material on their own.



Emily Fox

Amazon Professor of Machine Learning

Carlos Guestrin

Amazon Professor of Machine Learning
Computer Science and Engineering

À propos de Université de Washington

Founded in 1861, the University of Washington is one of the oldest state-supported institutions of higher education on the West Coast and is one of the preeminent research universities in the world....

À propos de la Spécialisation Apprentissage automatique

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data....
Apprentissage automatique

Foire Aux Questions

  • Une fois que vous êtes inscrit(e) pour un Certificat, vous pouvez accéder à toutes les vidéos de cours, et à tous les quiz et exercices de programmation (le cas échéant). Vous pouvez soumettre des devoirs à examiner par vos pairs et en examiner vous-même uniquement après le début de votre session. Si vous préférez explorer le cours sans l'acheter, vous ne serez peut-être pas en mesure d'accéder à certains devoirs.

  • Lorsque vous vous inscrivez au cours, vous bénéficiez d'un accès à tous les cours de la Spécialisation, et vous obtenez un Certificat lorsque vous avez réussi. Votre Certificat électronique est alors ajouté à votre page Accomplissements. À partir de cette page, vous pouvez imprimer votre Certificat ou l'ajouter à votre profil LinkedIn. Si vous souhaitez seulement lire et visualiser le contenu du cours, vous pouvez accéder gratuitement au cours en tant qu'auditeur libre.

D'autres questions ? Visitez le Centre d'Aide pour les Etudiants.