**The Homepage of Nearest Neighbors and Similarity Search**

Maintained by Yury Lifshits

Update: this page is frozen. Please visit the successor page by Arnoldo Muller

#### To preprocess a database of N objects so that given a query object,

one can effectively determine its nearest neighbors in database

Name of the problem: nearest neighbors, k nearest neighbors (kNN, k-NN), nearset neighbor search, proximity search, similarity search, approximate nearest neighbors (ANN), range queries, maximal intersection queries, post-office problem, partial match, best match file searching, best match retrieval, sequence nearest neighbors (SNN).

Solution concepts: locality-sensitive hashing (LSH), low-distortion embeddings, k-d trees, kd-trees, metric trees, M-trees, R*-trees, vp-trees, vantage point trees, vantage point forest, multi-vantage point tree, bisector trees, Orchard's algorithm, random projections, fixed queries tree, Voronoi tree, BBD-tree, min-wise independent permutations, Burkhard-Keller tree, generalized hyperplane tree, geometric near-neighbor access tree (GNAT), approximating eliminating search algorithm (AESA), inverted index, spatial approximation tree (SAT).

Applications: k-nearest neighbor classification algorithm, image similarity identification, audio similarity identification, fingerprint search, audio/video compression (MPEG), optical character recognition, coding theory, function approximation, recommendation systems, near-duplicate detection, targeting on-line ads, distributional similarity computation, spelling correction, nearest neighbor interpolation.

Related keywords: all-nearest-neighbors problem, indexing methods, spatial index, Voronoi diagram, spatial access methods (SAM), multidimensional access methods, closest pair, indexing algorithm, intrinsic dimension, Johnson-Lindenstrauss lemma, Johnson-Lindenstrauss transform, large-scale algorithms, scalability, dimensionality reduction, high dimensions, curse of dimensionality, high-dimensional spaces, cell probe model, metric spaces, Euclidean space, brunch-and-bound search, divide and conquer, massive data sets, metric embeddings, cell probe complexity, spatial data structures, Euclidean k-median problem, point sampling.

Find a mistake? Send me email to yury@yury.name

Something is missed? Give me a suggestion: yury@yury.name

Have an open problem? Publish it on this page! Copyrights are reserved to you.

Like it? Put a hyperlink on your page!