Work by Yury Lifshits on Nearest Neighbors and Similarity Search

The Homepage of Nearest Neighbors and Similarity Search
Maintained by Yury Lifshits

Update: this page is frozen. Please visit the successor page by Arnoldo Muller

Intro Tutorial Bibliography Researchers Links Open Problems Yury's work on NN

Promising research directions around nearest neighbors:

Relational nearest neighbors: using graph structure of underlying domain. Examples: co-occurrence similarity, recommendations via social network. References:
Large Scale Graph Algorithms lecture by Yury Lifshits.
G. Jeh, J. Widom, SimRank: a measure of structural-context similarity, KDD 2002.
I. Antonellis, H. Garcia-Molina, C.-C. Chang, Simrank++: Query rewriting through link analysis of the click graph, VLDB 2008.
H. Tong, C. Faloutsos, J.-Y. Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006.
Nearest neighbors for sparse vectors in Euclidean space
Low-distortion embeddings for social networks, similarity visualization.
Probabilistic analysis for specific domains: introduce reasonable input distributions and solve nearest neighbors for them.
Disorder method / Intrinsic dimension: fast algorithm for bounded average dimension.
Branch and bound techniques for bichromatic nearest neighbors
Construct algorithms for learning similarity function. Reference: Learning dissimilarity approach.
New dynamic aspects: object descriptions are changing and sometimes even similarity function is modified with time.
Design I/O efficient NN algorithms. I.e. take into account communication between internal and external memory. Reference: External Memory Geometric Data Structures by Lars Arge. Also this paper can be a possible starting point.
Comparative experiments. Introduce unique data sets accepted by the whole community for verifying all NN solutions.

Open problem N 1: Nearest neighbors for sparse vectors

Database: n vectors in R^m each having at most k < m nonzero coordinates

Query: vector in R^m also having at most k < m nonzero coordinates

Similarity: scalar product

Does there exist an algorithm for solving nearest neighbors on sparse vectors within following constraints: poly(n,m) time for preprocessing, poly(k, log n) for query processing?

Organizational:

Write a survey on lower bounds for nearest neighbors. Reference: scribe notes Lower Bounds for Data Structures (2006) by Mihai Pătraşcu and Mikkel Thorup