Jump to: navigation, search

The Similarity Search Problem Family

Definition of Terms

An atomic data element

Anything, really...

A data structure, representing an organisation of elements

Similarity Function:
A function that, given a target and an arrangement yields a value in <math>]-\infty;0]</math>, where 0 represents perfect similarity.

Algorithm Constraints:
A set of constraints for the algorithm. Examples:

  • Worst-case time complexity
  • Worst-case space complexity

Arrangement Constraints:
A set of constraints concerning the resulting arrangement. Examples:

  • An element may be used only once or more than once
  • An element may be left out
  • The arrangement must yield a similarity higher than some specified value

Similarity Search Algorithm:
An algorithm that, given a multiset of elements and a target, returns an arrangement

Creating a Specific Problem Definition

Creating a specific problem definition involves:

  1. select types:
    1. a target data type
    2. an arrangement data structure type
    3. domain of legal elements
  2. define a similarity function
  3. define arrangement and algorithm constraints
  4. create an algorithm that, given a specific target and a multiset of elements, returns an arrangement that abides the arrangement and algorithm constraints.

Example Problems

String Matching

element                    : character
target data type           : character string
arrangement data structure : character string
similarity function        : ?
algorithm constraints      : ?
arrangement constraints    : similarity must be above a certain value
similarity search algorithm:
    given a multiset of characters and a string, return a string representing
    an arrangement of the characters that abides the constraints

100 Piece Puzzle

This problem represents solving a 100-piece picture puzzle, potentially with arbitrary pieces that has nothing to do with the target image As there is 100! = 10^158 solutions, we should use an algorithm with lower time complexity than a brute force algorithm.

element                    : an image of dimensions W/10 x H/10
target data type           : an image of dimensions W x H
arrangement data structure : a 10 x 10 matrix of numbers representing piece placement
similarity function        : ?
algorithm constraints      : ?
arrangement constraints    : each piece must be used exactly once, and
                             similarity must a above a certain value
similarity search algorithm:
    given a set of 100 images and a string, return a 10 x 10 matrix of the
    numbers 1-100 representing the placement of the pieces that
    abides the constraints

GM Sample Matching

Create a GM Midi file that when played on a given sound card, is similar to a given sample.


  • Fourier Analysis of sample, and of each GM sound
  • Sonology: what is similarity of sound?

(Naïve solution: you could subtract the spectrogram images from each other and calculate the intensity of the resulting image?)

Piano Piece Sample Matching

Based on playability constraints (interesting finger-algorithm!), create a (Lilypond) score that, when played on the piano and perhaps time-compressed, is similar to a given sample. Use a function that returns the partial tones of a given piano note for the solution.


Given a 3D object and a multiset of LEGO blocks, make a LEGO figure similar to the object.


The most important and time-consuming aspects of solving a specific similarity search problem are defining the similarity function and creating the algorithm.