Loading...

Using random lines to partition points

We start the course by considering a retrieval task of fetching a document similar to one someone is currently reading. We cast this problem as one of nearest neighbor search, which is a concept we have seen in the Foundations and Regression courses. However, here, you will take a deep dive into two critical components of the algorithms: the data representation and metric for measuring similarity between pairs of datapoints. You will examine the computational burden of the naive nearest neighbor search algorithm, and instead implement scalable alternatives using KD-trees for handling large datasets and locality sensitive hashing (LSH) for providing approximate nearest neighbors, even in high-dimensional spaces. You will explore all of these ideas on a Wikipedia dataset, comparing and contrasting the impact of the various choices you can make on the nearest neighbor results produced.

À propos de Coursera

Cours, Spécialisations et Diplômes en ligne enseignés par des enseignants du plus haut niveau provenant des meilleurs universités et établissements d'enseignement du monde.

Community
Join a community of 40 million learners from around the world
Certificate
Earn a skill-based course certificate to apply your knowledge
Career
Gain confidence in your skills and further your career