Supervised Deep Hashing for Efficient Audio Event Retrieval

Efficient retrieval of audio events can facilitate real-time implementation of numerous query and search-based systems. This work investigates the potency of different hashing techniques for efficient audio event retrieval. Multiple state-of-the-art weak audio embeddings are employed for this purpose. The performance of four classical unsupervised hashing algorithms is explored as part of off-the-shelf analysis. Then, we propose a partially supervised deep hashing framework that transforms the weak embeddings into a low-dimensional space while optimizing for efficient hash codes. The model uses only a fraction of the available labels and is shown here to significantly improve the retrieval accuracy on two widely employed audio event datasets. The extensive analysis and comparison between supervised and unsupervised hashing methods presented here, give insights on the quantizability of audio embeddings. This work provides a first look in efficient audio event retrieval systems and hopes to set baselines for future research.

deep audio event retrieval

Fig. 1: Overview of the employed model for deep audio event hashing

 

map plots audio retrieval

Supervised Deep Hashing for Efficient Audio Retrieval [Video]

Audio Event Classification (AEC) is defined as the inherent ability of machines to assign a semantic label to a given audio segment. In spite of multiple efforts in learning better and robust audio representations (or embeddings), there has not been adequate amount of research in efficient retrieval of audio events. Fast retrieval can facilitate near-real-time similarity search between a query sound and a database containing millions of audio events. This work, the first of its kind, investigates the potency of different hashing techniques for efficient audio event retrieval. We employ state-of-the-art audio embeddings as features. We analyze the performance of some classical unsupervised hashing algorithms. Then we show that employing a small portion of the annotated database for supervised hashing via Deep Quantization Network (DQN)…