## The Similarity Search Problem Family

### Definition of Terms

Element:
An atomic data element

Target:
Anything, really...

Arrangement:
A data structure, representing an organisation of elements

Similarity Function:
A function that, given a target and an arrangement yields a value in $]-\infty;0]$, where 0 represents perfect similarity.

Algorithm Constraints:
A set of constraints for the algorithm. Examples:

• Worst-case time complexity
• Worst-case space complexity

Arrangement Constraints:
A set of constraints concerning the resulting arrangement. Examples:

• An element may be used only once or more than once
• An element may be left out
• The arrangement must yield a similarity higher than some specified value

Similarity Search Algorithm:
An algorithm that, given a multiset of elements and a target, returns an arrangement

### Creating a Specific Problem Definition

Creating a specific problem definition involves:

1. select types:
1. a target data type
2. an arrangement data structure type
3. domain of legal elements
2. define a similarity function
3. define arrangement and algorithm constraints
4. create an algorithm that, given a specific target and a multiset of elements, returns an arrangement that abides the arrangement and algorithm constraints.

## Example Problems

### String Matching

element                    : character
target data type           : character string
arrangement data structure : character string
similarity function        : ?
algorithm constraints      : ?
arrangement constraints    : similarity must be above a certain value
similarity search algorithm:
given a multiset of characters and a string, return a string representing
an arrangement of the characters that abides the constraints


### 100 Piece Puzzle

This problem represents solving a 100-piece picture puzzle, potentially with arbitrary pieces that has nothing to do with the target image As there is 100! = 10^158 solutions, we should use an algorithm with lower time complexity than a brute force algorithm.

element                    : an image of dimensions W/10 x H/10
target data type           : an image of dimensions W x H
arrangement data structure : a 10 x 10 matrix of numbers representing piece placement
similarity function        : ?
algorithm constraints      : ?
arrangement constraints    : each piece must be used exactly once, and
similarity must a above a certain value
similarity search algorithm:
given a set of 100 images and a string, return a 10 x 10 matrix of the
numbers 1-100 representing the placement of the pieces that
abides the constraints


### GM Sample Matching

Create a GM Midi file that when played on a given sound card, is similar to a given sample.

Subproblems:

• Fourier Analysis of sample, and of each GM sound
• Sonology: what is similarity of sound?

(NaÃ¯ve solution: you could subtract the spectrogram images from each other and calculate the intensity of the resulting image?)

### Piano Piece Sample Matching

Based on playability constraints (interesting finger-algorithm!), create a (Lilypond) score that, when played on the piano and perhaps time-compressed, is similar to a given sample. Use a function that returns the partial tones of a given piano note for the solution.

### LEGO

Given a 3D object and a multiset of LEGO blocks, make a LEGO figure similar to the object.

## Solutions

The most important and time-consuming aspects of solving a specific similarity search problem are defining the similarity function and creating the algorithm.