英语论文网

computationally efficient. This is related to the
size of the fingerprints, the complexity of the search algorithm
and the complexity of the fingerprint extraction.
The design principles behind audio fingerprinting are recurrent
in several research areas. Compact signatures that represent
complex multimedia objects are employed in Information
Retrieval for fast indexing and retrieval. In order to index
complex multimedia objects it is necessary to reduce their
dimensionality (to avoid the “curse of dimensionality”) and
perform the indexing and searching in the reduced space [1]–
[3]. In analogy to the cryptographic hash value, content-based
digital signatures can be seen as evolved versions of hash values
that are robust to content-preserving transformations [4],
[5]. Also from a pattern matching point of view, the idea
of extracting the essence of a class of objects retaining the
main its characteristics is at the heart of any classification
system [6]–[10].
II. GENERAL FRAMEWORK
In spite of the different rationales behind the identification
task, methods share certain aspects. As depicted in Fig.1, there
are two fundamental processes: the fingerprint extraction and
the matching algorithm. The fingerprint extraction derives a
set of relevant perceptual characteristics of a recording in a
concise and robust form. The fingerprint requirements include:

Discrimination power over huge numbers of other fingerprints,

Invariance to distortions,

Compactness,

Computational simplicity.
The solutions proposed to fulfill the above requirements imply
a trade-off between dimensionality reduction and information
loss. The fingerprint extraction consists of a front-end and a
fingerprint modeling block (see Fig.2). The front-end computes
a set of measurements from the signal (see Section
III). The fingerprint model block defines the final fingerprint
representation, e.g: a vector, a trace of vectors, a codebook,
Front-end Fingerprint
modeling
Fingerprint extraction
Audio signal
Hypothesis
testing
Matching
Audio metadata
Fingerprints
+
Metadata
DB
Database
look-up Distance
Search
Fig. 1. Content-based Audio Identification Framework.
a sequence of indexes to HMM sound classes, a sequence
of error correcting words or musically meaningful high-level
attributes (see Section IV).
Given a fingerprint derived from a recording, the matching
algorithm searches a database of fingerprints to find the best
match. A way of comparing fingeprints, that is a distance,
is therefore needed (see Section V-A). Since the number of
comparison is high and the distance can be expensive to
compute, we require methods that speed up the search. It is
common to see methods that use a simpler distance to quickly
discard candidates and the more correct but expensive distance
for the reduced set of candidates. There are also methods that
pre-compute some distances off-line and build a data structure
that allows reducing the number of computations to do on-line
(see Section V-B). According to [1], good searching methods
should be :

Fast: Sequential scanning and distance calculation can be
too slow for huge databases.

Correct: Should return the qualifying objects, without
missing any – low False Rejection Rate (FRR).

Memory efficient: They sho