Maintained by Yury Lifshits
Update: this page is frozen. Please visit the successor page by Arnoldo Muller
Promising research directions around nearest neighbors:
- Relational nearest neighbors: using graph structure of underlying domain. Examples: co-occurrence
similarity, recommendations via social network. References:
Large Scale Graph Algorithms lecture by Yury Lifshits.
G. Jeh, J. Widom, SimRank: a measure of structural-context similarity, KDD 2002.
I. Antonellis, H. Garcia-Molina, C.-C. Chang, Simrank++: Query rewriting through link analysis of the click graph, VLDB 2008.
H. Tong, C. Faloutsos, J.-Y. Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006. - Nearest neighbors for sparse vectors in Euclidean space
- Low-distortion embeddings for social networks, similarity visualization.
- Probabilistic analysis for specific domains: introduce reasonable input distributions and solve nearest neighbors for them.
- Disorder method / Intrinsic dimension: fast algorithm for bounded average dimension.
- Branch and bound techniques for bichromatic nearest neighbors
- Construct algorithms for learning similarity function. Reference: Learning dissimilarity approach.
- New dynamic aspects: object descriptions are changing and sometimes even similarity function is modified with time.
- Design I/O efficient NN algorithms. I.e. take into account communication between internal and external memory. Reference: External Memory Geometric Data Structures by Lars Arge. Also this paper can be a possible starting point.
- Comparative experiments. Introduce unique data sets accepted by the whole community for verifying all NN solutions.
Open problem N 1: Nearest neighbors for sparse vectors
Database: n vectors in R^m each having at most k < m nonzero coordinates
Query: vector in R^m also having at most k < m nonzero coordinates
Similarity: scalar product
Does there exist an algorithm for solving
nearest neighbors on sparse vectors within following constraints:
poly(n,m) time for preprocessing, poly(k, log n) for query processing?
Organizational:
- Write a survey on lower bounds for nearest neighbors. Reference: scribe notes Lower Bounds for Data Structures (2006) by Mihai Pătraşcu and Mikkel Thorup