**The Homepage of Nearest Neighbors and Similarity Search**

Maintained by Yury Lifshits

Update: this page is frozen. Please visit the successor page by Arnoldo Muller

**Bibliography**Researchers Links Open Problems Yury's work on NN

I am now trying to sort all papers by topic. This work is in progress. Please email me all papers I missed so far. Subareas:

- Nearest neighbors in general metric space
- Branch and bound for Euclidean space
- Mapping-based techniques: locality sensitive hashing, random projections, low-distortion embeddings
- Nearest neighbors and intrinsic dimension
- Nearest neighbors for sequences
- Learning similarity function
- Lower bounds
- Experimental work
- Related algorithmic problems
- Applications of similarity search
- Unsorted

#### Nearest neighbors in general metric space

P. Zezula, G. Amato, V. Dohnal, M. Batko, Similarity Search - The Metric Space Approach, Springer, 2006.E. Chavez, G. Navarro, R.A. Baeza-Yates, J.L. Marroquin, Searching in metric spaces, 2001

G.R. Hjaltason, H. Samet, Index-driven similarity search in metric spaces, ACM Transactions on Database Systems, 2003.

K. Fukunaga, P.M. Narendra, A Branch and Bound Algorithm for Computing k-Nearest Neighbors, Transactions on Computers, 1975.

M.T. Orchard, A fast nearest-neighbor search algorithm, ICASSP'91.

N. Roussopoulos, S. Kelley, F. Vincent, Nearest neighbor queries, SIGMOD'95.

L. Mico, J. Oncina, R.C. Carrasco, An algorithm for finding nearest neighbours in constant average time with a linear space complexity, Pattern Recognition Letters, 1996.

D. White, R. Jain, Similarity Indexing: Algorithms and Performance, SPIE'96.

K. Clarkson, Nearest Neighbor Queries in Metric Spaces, STOC'97

P. Ciaccia, M. Patella, P. Zezula, M-tree An Efficient Access Method for Similarity Search in Metric Spaces, VLDB'97.

T. Seidl, H.P. Kriegel, Optimal multi-step k-nearest neighbor search, SIGMOD'98.

R. Weber, H.J. Schek, S. Blott, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, VLDB'98.

K.P. Bennett, U. Fayyad, D. Geiger, Density-Based Indexing for Approximate Nearest-Neighbor Queries, KDD'99.

K.L. Clarkson, Nearest Neighbor Queries in Metric Spaces, Discrete and Computational Geometry, 1999.

T. Bozkaya, M. Ozsoyoglu, Indexing large metric spaces for similarity search queries, ACM Transactions on Database Systems, 1999.

P. Ciaccia, M. Patella, PAC nearest neighbor queries: Approximate and controlled search inhigh-dimensional and metric spaces, ICDE'00.

M. Batko, D. Novak, P. Zezula, MESSIF: Metric Similarity Search Implementation Framework, DELOS'07.

M.E. Houle, J. Sakuma, Fast approximate similarity search in extremely high-dimensional data sets, ICDE'05.

P.N. Yianilos, Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces, SODA'93.

V.G. Pestov and A. Stojmirović, Indexing schemes for similarity search: an illustrated paradigm, Fundamenta Informaticae, 2006.

N. Katayama, S. Satoh, The SR-tree: an index structure for high-dimensional nearest neighbor queries, SIGMOD'97.

S. Brin, Near neighbor search in large metric spaces, VLDB'95.

E. Chavez, G. Navarro, A Compact Space Decomposition for Effective Metric Indexing, Pattern Recognition Letters, 2005.

B. Bustos, G. Navarro, E. Chavez, Pivot Selection Techniques for Proximity Searching in Metric Spaces, Pattern Recognition Letters, 2003.

F. Chierichetti, A. Panconesi, P. Raghavan, M. Sozio, A. Tiberi, E. Upfal, Finding Near Neighbors Through Cluster Pruning, PODS'07.

C. Traina Jr., A.J.M. Traina, M.R. Vieira, A.S. Arantes, and C. Faloutsos, Efficient Processing of Complex Similarity Queries in RDBMS Through Query Rewriting, CIKM'06.

M.R. Vieira, C. Traina Jr., A.J.M. Traina, A.S. Arantes and C. Faloutsos, Boosting k-Nearest Neighbor Queries Estimating Suitable Query Radii, SSDBM'07.

D. Cantone, A. Ferro, A. Pulvirenti, D. Reforgiato, D. Shasha, Antipole Indexing to Support Range Search and K-Nearest Neighbor on Metric Spaces, Transactions on Knowledge and Data Engineering, 2005.

#### Branch and bound for Euclidean space

J. Bentley, M. Shamos, Divide and Conquer in Multidimensional Spaces, STOC'76, pp. 220-230.D. Dobkin, R. Lipton, Multidimensional Searching Problems, SIAM Journal of Computing 5 (1976), pp. 181-186.

Robert F. Sproull Refinements to nearest-neighbor searching in k-dimensional trees (1989)

S. Arya, D. Mount, Approximate nearest neighbor queries in fixed dimensions, SODA'93.

S.A. Nene and S.K. Nayar A Simple Algorithm for Nearest Neighbor Search in High Dimensions Transactions on Pattern Analysis and Machine Intelligence, 1997.

S. Arya, D. N. Mount, S. Netanyahu, R. Silverman, A. Y. Wu, An optimal algorithm for approximate nearest neighbor searching, JACM, 1998.

A. Guttman, R-trees: a dynamic index structure for spatial searching, SIGMOD'84.

V. Gaede, O. Gunther, Multidimensional Access Methods, ACM Computing Surveys, 1998

#### Mapping-based techniques: locality sensitive hashing, random projections, low-distortion embeddings

J. Kleinberg, Two algorithms for nearest-neighbor search in high dimensions , STOC'97.P. Indyk, Nearest neighbors in high-dimensional spaces, chapter of Handbook of Discrete and Computational Geometry, 2004.

P. Indyk, On Approximate Nearest Neighbors in Non-Euclidean Spaces, FOCS'98.

P. Indyk, R. Motwani, Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality, STOC'98.

E. Kushilevitz, R. Ostrovsky, Y. Rabani, Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces, STOC'98.

P. Indyk, Dimensionality Reduction Techniques for Proximity Problems, SODA 2000.

A. Gionis, P. Indyk, R. Motwani, Similarity Search in High Dimensions via Hashing, VLDB'99.

R. Fagin, R. Kumar, D. Sivakumar, Efficient Similarity Search and Classification via Rank Aggregation, SIGMOD'03.

A. Andoni, P. Indyk, Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, FOCS'06.

K. Terasawa and Y. Tanaka, Spherical LSH for approximate nearest neighbor search on unit hypersphere, WADS'07.

S. Dasgupta, A. Gupta, An elementary proof of the Johnson-Lindenstrauss lemma, Berkeley, TR-99-006.

Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search, VLDB'07.

N. Ailon, B. Chazelle, Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform, STOC'06.

M. Farach-Colton, P. Indyk, Approximate Nearest Neighbor Algorithms for Hausdorff Metrics via Embeddings, FOCS'99.

J. Goldstein, J.C. Platt, C.J.C. Burges, Redundant Bit Vectors for Quickly Searching High-Dimensional Regions, Deterministic and Statistical Methods in Machine Learning'05.

#### Nearest neighbors and intrinsic dimension

V.G. Pestov, On the geometry of similarity search: dimensionality curse and concentration of measure, Information Processing Letters, 2000.K.L. Clarkson, Nearest Neighbor Queries in Metric Spaces, STOC'97. Full version Discrete & Computational Geometry, 1999.

D.R. Karger, M. Ruhl, Finding nearest neighbors in growth-restricted metrics, STOC'02.

R. Krauthgamer and J.R. Lee, Navigating nets: simple algorithms for proximity search, SODA'04.

R. Krauthgamer and J.R. Lee, The black-box complexity of nearest-neighbor search, Theoretical Computer Science, 2005.

R. Cole and L.-A. Gottlieb, Searching dynamic point sets in spaces with bounded doubling dimension, STOC'06.

K.L. Clarkson, Nearest-neighbor searching and metric space dimensions, 2006.

A. Beygelzimer, S. Kakade, J. Langford, Cover Trees for Nearest Neighbor, ICML 2006.

S. Har-Peled, M. Mendel, Fast construction of nets in low dimensional metrics, and their applications, SIAM Journal of Computing, 2006.

V.G. Pestov Intrinsic dimension of a dataset: what properties does one expect?, IJCNN'07.

T-H. Hubert Chan, A. Gupta and K. Talwar, Ultra-Low-Dimensional Embeddings for Doubling Metrics, SODA'08.

N. Goyal, Y. Lifshits, H. Schütze, Disorder Inequality: A Combinatorial Approach to Nearest Neighbor Search, WSDM'08.

Y. Lifshits, S. Zhang, Similarity Search via Combinatorial Nets, submitted.

#### Nearest neighbors for sequences

R. Agrawal, C. Faloutsos, A. Swami, Efficient similarity search in sequence databases FODO'93.M. Muthukrishnan, C. Sahinalp, Approximate nearest neighbors and sequence comparison with block operations, STOC'00.

U. Keich, M. Li, B. Ma, J. Tromp , On spaced seeds for similarity search, Discrete Applied Mathematics, 2004.

V.G. Pestov, A. Stojmirović, Indexing Schemes for Similarity Search In Datasets of Short Protein Fragments, accepted to Information Systems, 2007.

#### Learning similarity function

L. Cayton, S. Dasgupta, A learning framework for nearest-neighbor search, NIPS'07.A. Frome, Y. Singer, F. Sha, and J. Malik. Learning globally-consistent local distance functions for shape-based image retrieval and classification, ICCV'07.

K. Kummamuru, R. Krishnapuram, R. Agrawal, Learning Spatially Variant Dissimilarity (SVaD) Measures, KDD'04.

#### Lower bounds

A. Borodin, R. Ostrovsky, Y. Rabani, Lower Bounds for High Dimensional Nearest Neighbor Search and Related Problems, STOC'99.R. Motwani, A. Naor, R. Panigrahi, Lower bounds on locality sensitive hashing, SoCG'06.

C. Sahinalp and A. Utis, Hardness of String Similarity Search and Other Indexing Problems, ICALP'04.

J. Bruck, M. Naor, The hardness of decoding linear codes with preprocessing, IEEE Transactions on Information Theory, 1990.

A. Chakrabarti, B. Chazelle, B. Gum, A. Lvov, A Lower Bound on the Complexity of Approximate Nearest-Neighbor Searching on the Hamming Cube, STOC'99.

M. Pătraşcu, M. Thorup, Higher Lower Bounds for Near-Neighbor and Further Rich Problems, FOCS'06.

A. Andoni, P. Indyk, and M. Pătraşcu, On the Optimality of the Dimensionality Reduction Method, FOCS'06.

#### Experimental work

A.M. Kibriya, E. Frank An Empirical Comparison of Exact Nearest Neighbour Algorithms, PKDD'07.M. Batko, D. Novak, F. Falchi, P. Zezula, Scalability Comparison of Peer-to-Peer Similarity-Search Structures, Future Generation Computer Systems, Elsevier, 2008.

#### Related algorithmic problems

K. Clarkson, A randomized algorithm for closest-point queries , SICOMP'88.P. Agarwal, H. Edelsbrunner, O. Schwartzkopf, E. Wezl, Euclidean minimum spanning trees and bichromatic closest pair, SoCG'90, also Discrete Computational Geometry 1991.

M. Bern, Approximate closest point queries in high dimensions, Information Processing Letters, 1993.

S. Meiser, Point location in arrangements of hyperplanes, Information and Computation, 1993.

K. Clarkson, An algorithm for approximate closest-point queries SoCG'94.

P. Agarwal, J. Erickson, Geometric range searching and its relatives, Advances in Discrete and Computational Geometry, 1999.

D. Eppstein, Fast hierarchical clustering and other applications of dynamic closest pairs, SODA'98.

A. Borodin, R. Ostrovsky, Y. Rabani, Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces, STOC'99.

P. Indyk, Sublinear Time Algorithms for Metric Space Problems , STOC'99.

A.Z. Broder, M. Charikar, A.M. Frieze, M. Mitzenmacher, Min-wise independent permutations, Journal of Computer and System Sciences, 2000.

K. Jain, V.V. Vazirani, Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation, J ACM 2001.

A. Goel, P. Indyk, K. Varadarajan, Reductions among high dimensional proximity problems, SODA 2001.

B. Hoffmann, Y. Lifshits, D. Nowotka, Maximal Intersection Queries in Randomized Graph Models, CSR'07.

R. Benetis, C.S. Jensen, G. Karciauskas, S. Saltenis, Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects VLDB J., 2006.

H. Schutze and C. Silverstein, Projections for Efficient Document Clustering, SIGIR'97.

B. Chazelle, D. Liu, A. Magen, Approximate Range Searching in Higher Dimension, CCCG'2004.

Y. Tao, D. Papadias, Q. Shen, Continuous Nearest Neighbor Search, VLDB'02.

#### Applications of similarity search

T. Cover, P. Hart, Nearest Neighbor pattern classification, IEEE Transactions on Information Theory, 1967.A.J. Broder Strategies for efficient incremental nearest neighbor search, Pattern Recognition, 1990.

R. Agrawal, K.I. Lin, H.S. Sawhney, K. Shim Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, VLDB'95.

S. Berchtold, H.P. Kriegel, S3: similarity search in CAD database systems, SIGMOD'97.

T. Seidl, H.P. Kriegel Efficient User-Adaptable Similarity Search in Large Multimedia Databases, VLDB'97.

D.A. Keim, Efficient Geometry-based Similarity Search of 3D Spatial Databases, SIGMOD'99.

E.J. Keogh and M.J. Pazzani, An Indexing Scheme for Fast Similarity Search in Large Time Series Databases, PAKDD'00.

T. Schlieder, Similarity Search in XML Data using Cost-Based Query Transformations, Workshop on Web and Databases, 2001.

R. Ohbuchi, T. Otagiri, M. Ibato, T. Takei Shape-similarity search of three-dimensional models using parameterized statistics, Pacific Graphics'02.

A. Blessing, H. Schutze, Inverted Indexes For Fast Distributional Similarity Computations, to appear 2007.

E. Keogh, K. Chakrabarti, M. Pazzani, S. Mehrotra, Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases, Knowledge and Information Systems, 2001.

B. Zhang, S.N. Srihari A fast algorithm for finding k-nearest neighbors with non-metric dissimilarity, Frontiers in Handwriting Recognition, 2002.

R.F. Santos Filho, A. Traina, C. Traina Jr, C. Faloutsos, Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods, ICDE'01.

T.H. Haveliwala, A. Gionis, D. Klein, P. Indyk, Evaluating strategies for similarity search on the web, WWW'02.

S. Chakrabarti, K. Puniyani, S. Das, Optimizing Scoring Functions and Indexes for Proximity Search in Type-annotated Corpora, WWW'06.

J. Dean and M.R. Henzinger, Finding Related Web Pages in the World Wide Web, WWW'99.

Z. Bar-Yossef, I. Keidar, U. Schonfeld, Do Not Crawl in the dust: Different URLs with Similar Text, WWW'06.

S. Ilyinsky, M. Kuzmin, A. Melkov, I. Segalovich, An efficient method to detect duplicates of web documents with the use of inverted index, WWW'02.

D. Fogaras, B. Racz, Scaling link-based similarity search, WWW'05.

Q. Lv, M. Charikar, K. Li, Image similarity search with compact data structures, CIKM'04.

S. Prabhakar, D. Agrawal, A.E. Abbadi, Efficient Disk Allocation for Fast Similarity Searching, SPAA'97.

H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, A.E. Abbadi, Approximate Nearest Neighbor Searching in Multimedia Databases, ICDE'01.

A.N. Papadopoulos, Y. Manolopoulos, Nearest Neighbor Search: A Database Perspective, Springer, 2005.

C. Faloutsos, K.-I. Lin, FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets, SIGMOD'95.

C. Böhm, S. Berchtold, D.A. Keim, Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases, ACM Computing Surveys, 2001.

L. Cazzanti, M.R. Gupta, Local Similarity Discriminant Analysis, ICML'07.

Daniel Lemire, Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound, Pattern Recognotion, 2008.

F. Falchi, C. Gennaro, F. Rabitti, P. Zezula, Distance Browsing in Distributed Multimedia Databases, Future Generation Computer Systems, Elsevier, 2009.

#### Unsorted

T.M. Chan Approximate Nearest Neighbor Queries Revisited, SoCG'97.S. Berchtold, C. Böhm, D.A. Keim, H.P. Kriegel, A cost model for nearest neighbor search in high-dimensional data space, PODS'97.

S. Berchtold, C. Böhm, B Braunmüller, D.A. Keim, H.P. Kriegel, Fast parallel similarity search in multimedia databases, SIGMOD'97.

K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, When Is "Nearest Neighbor" Meaningful? ICDT'99.

A. Hinneburg, C.C. Aggarwal, D.A. Keim, What is the Nearest Neighbor in High Dimensional Spaces?, VLDB'00.

C.C. Aggarwal, On the effects of dimensionality reduction on high dimensional similarity search, SIGMOD'01.

C. Li, E. Chang, H. Garcia-Molina, G. Wiederhold, Clustering for Approximate Similarity Search in High-Dimensional Spaces IEEE Transactions on Knowledge and Data Engineering, 2002.

G. R. Hjaltason and H. Samet. Properties of embedding methods for similarity searching in metric spaces. Transactions on Pattern Analysis and Machine Intelligence, 2003.

G. R. Hjaltason and H. Samet. Ranking in spatial databases. SSD'95.

G. R. Hjaltason and H. Samet. Distance browsing in spatial databases. ACM Transactions on Database Systems, 1999.

A. Henrich. A distance-scan algorithm for spatial access structures. Workshop on Geographic Information Systems, 1994.