**The Homepage of Nearest Neighbors and Similarity Search**

Maintained by Yury Lifshits

Update: this page is frozen. Please visit the successor page by Arnoldo Muller

**Open Problems**Yury's work on NN

Promising research directions around nearest neighbors:

**Relational**nearest neighbors: using graph structure of underlying domain. Examples: co-occurrence similarity, recommendations via social network. References:Large Scale Graph Algorithms lecture by Yury Lifshits.

G. Jeh, J. Widom, SimRank: a measure of structural-context similarity, KDD 2002.

I. Antonellis, H. Garcia-Molina, C.-C. Chang, Simrank++: Query rewriting through link analysis of the click graph, VLDB 2008.

H. Tong, C. Faloutsos, J.-Y. Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006.- Nearest neighbors for
**sparse**vectors in Euclidean space - Low-distortion embeddings for social networks,
**similarity visualization**. **Probabilistic analysis**for specific domains: introduce reasonable input distributions and solve nearest neighbors for them.- Disorder method / Intrinsic dimension: fast algorithm for bounded
**average**dimension. - Branch and bound techniques for
**bichromatic**nearest neighbors - Construct algorithms for
**learning**similarity function. Reference: Learning dissimilarity approach. - New
**dynamic aspects**: object descriptions are changing and sometimes even similarity function is modified with time. - Design
**I/O**efficient NN algorithms. I.e. take into account communication between internal and external memory. Reference: External Memory Geometric Data Structures by Lars Arge. Also this paper can be a possible starting point. **Comparative experiments**. Introduce unique data sets accepted by the whole community for verifying all NN solutions.

**Open problem N 1:**Nearest neighbors for sparse vectors

Database: n vectors in R^m each having at most k < m nonzero coordinates

Query: vector in R^m also having at most k < m nonzero coordinates

Similarity: scalar product

Does there exist an algorithm for solving
nearest neighbors on sparse vectors within following constraints:
poly(n,m) time for preprocessing, poly(k, log n) for query processing?

Organizational:

- Write a survey on lower bounds for nearest neighbors. Reference: scribe notes Lower Bounds for Data Structures (2006) by Mihai Pătraşcu and Mikkel Thorup