英语论文网

fied multimedia content are then retrieved by
computing a fingerprint and using this as a query in the
fingerprint/meta-data database. The advantage of using
fingerprints instead of the multimedia content itself is three-fold:
1. Reduced memory/storage requirements as fingerprints
are relatively small;
2. Efficient comparison as perceptual irrelevancies have
already been removed from fingerprints;
3. Efficient searching as the dataset to be searched is
smaller.
As can be concluded from above, a fingerprint system generally
consists of two components: a method to extract fingerprints and a
method to efficiently search for matching fingerprints in a
fingerprint database.
This paper describes an audio fingerprinting system that is suitable
for a large number of applications. After defining the concept of
an audio fingerprint in Section 2 and elaborating on possible
applications in Section 3, we focus on the technical aspects of the
proposed audio fingerprinting system. Fingerprint extraction is
described in Section 4 and fingerprint searching in Section 5.
2. AUDIO FINGERPRINTING CONCEPTS
2.1 Audio Fingerprint Definition
Recall that an audio fingerprint can be seen as a short summary of
an audio object. Therefore a fingerprint function F should map an
audio object X, consisting of a large number of bits, to a
fingerprint of only a limited number of bits.
Here we can draw an analogy with so-called hash functions1,
which are well known in cryptography. A cryptographic hash
function H maps an (usually large) object X to a (usually small)
hash value (a.k.a. message digest). A cryptographic hash function
allows comparison of two large objects X and Y, by just
comparing their respective hash values H(X) and H(Y). Strict
mathematical equality of the latter pair implies equality of the
former, with only a very low probability of error. For a properly
designed cryptographic hash function this probability is 2-n, where
n equals the number of bits of the hash value. Using cryptographic
hash functions, an efficient method exists to check whether or not
a particular data item X is contained in a given and large data set
Y={Yi}. Instead of storing and comparing with all of the data in Y,
1 In the literature fingerprinting is sometimes also referred to as
robust or perceptual hashing[5].
Permission to make digital or hard copies of all or part of this
work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or
commercial advantage and that copies bear this notice and the
full citation on the first page.
© 2002 IRCAM – Centre Pompidou
A Highly Robust Audio Fingerprinting System
it is sufficient to store the set of hash values {hi = H(Yi)}, and to
compare H(X) with this set of hash values.
At first one might think that cryptographic hash functions are a
good candidate for fingerprint functions. However recall from the
introduction that, instead of strict mathematical equality, we are
interested in perceptual similarity. For example, an original CD
quality version of ‘Rolling Stones – Angie’ and an MP3 version at
128Kb/s sound the same to the human auditory system, but their
waveforms can be quite different. Although the two versions are
perceptually similar they are mathematically quite different.
Therefore cryptographic