The Homepage of Nearest Neighbors and Similarity Search
Maintained by Yury Lifshits

Update: this page is frozen. Please visit the successor page by Arnoldo Muller

Intro     Tutorial     Bibliography     Researchers     Links     Open Problems     Yury's work on NN

Promising research directions around nearest neighbors:
  1. Relational nearest neighbors: using graph structure of underlying domain. Examples: co-occurrence similarity, recommendations via social network. References:
    Large Scale Graph Algorithms lecture by Yury Lifshits.
    G. Jeh, J. Widom, SimRank: a measure of structural-context similarity, KDD 2002.
    I. Antonellis, H. Garcia-Molina, C.-C. Chang, Simrank++: Query rewriting through link analysis of the click graph, VLDB 2008.
    H. Tong, C. Faloutsos, J.-Y. Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006.
  2. Nearest neighbors for sparse vectors in Euclidean space
  3. Low-distortion embeddings for social networks, similarity visualization.
  4. Probabilistic analysis for specific domains: introduce reasonable input distributions and solve nearest neighbors for them.
  5. Disorder method / Intrinsic dimension: fast algorithm for bounded average dimension.
  6. Branch and bound techniques for bichromatic nearest neighbors
  7. Construct algorithms for learning similarity function. Reference: Learning dissimilarity approach.
  8. New dynamic aspects: object descriptions are changing and sometimes even similarity function is modified with time.
  9. Design I/O efficient NN algorithms. I.e. take into account communication between internal and external memory. Reference: External Memory Geometric Data Structures by Lars Arge. Also this paper can be a possible starting point.
  10. Comparative experiments. Introduce unique data sets accepted by the whole community for verifying all NN solutions.
Open problem N 1: Nearest neighbors for sparse vectors

Database: n vectors in R^m each having at most k < m nonzero coordinates

Query: vector in R^m also having at most k < m nonzero coordinates

Similarity: scalar product

Does there exist an algorithm for solving nearest neighbors on sparse vectors within following constraints: poly(n,m) time for preprocessing, poly(k, log n) for query processing?

  1. Write a survey on lower bounds for nearest neighbors. Reference: scribe notes Lower Bounds for Data Structures (2006) by Mihai Pătraşcu and Mikkel Thorup