Since the emergence of Music Streaming Platforms, some companies have built up huge databases with millions of songs, and managing such catalogs is not an easy task. Particularly, it's necessary to have relevant information about each song in order to efficiently organize all titles. Nevertheless, manually annotating millions of songs is a titanic job for humans, and because of the fast changes in music industry, most of these annotations have to be updated every year. To this aim, a scientific field appeared two decades ago. It deals with the automatic analysis of digital signals of music in order to retrieve Musical Information directly from the recordings. Then, with dedicated algorithms, a computer can recognize some attributes; such as: the genre of a song, its mood, the played instruments, the tempo, the key, the chord progression, and many other information. In the context of music recommendation, and especially for FuturePulse, the use of such a technology is highly relevant.


But a question occurs: how does a computer listen to music? Many different technics have been developed by researchers, we here summarize the most common approach.


First, the traditional sound signal, the waveform, is processed to provide a more informative representation. This transformation which is named the "Fourier Transform" (from the physicist Joseph Fourier), decomposes a signal as the sum of elementary signals ordered in time and according to their pitch (frequency). With this representation, some notable patterns are characteristic of percussive sounds, pitched instruments, and spoken/singing voice for example.


Second, for a given task, other processings have been designed by scientists in order to emphasize the sound characteristics which help to predict the wanted musical information. For example, the automatic tempo estimation mainly focuses on percussive sounds and rhythmic profiles; whereas the key estimation is better helped by pitched sounds.



Finally, to make a decision of the desired value, most of approaches are based on artificial intelligence technics and especially "Machine Learning". For instance, in a genre recognition task, a subset of annotated songs is used to train mathematical models. In other words, starting from the previously mentioned representations of sound, for all the given songs of the subset, the computer automatically learns what is the characteristic representation of each musical genres.


For FuturePulse, the research institute IRCAM is responsible for developing or improving technics for some tasks considered as relevant for the project. They are: tempo prediction, key and mode estimation (minor/major), fade in/fade out detection, vocal gender recognition (male/female) and musical genre classification. Combined together, these automatically retrieved information help for the choice of recommended songs. After having developed traditional approaches for signal analyses, we are currently exploring the use of deep neural networks for better estimations. Also known as "Deep Learning", this scientific field has provided very promising results, and it is nowadays the base of the best technics in Artificial Intelligence. Some research works have already confirmed the relevance of Artificial Neural Networks in Music Information Retrieval, and we believe that FuturePulse will take advantage of it.


Remi Mignot


Researcher, IRCAM


Share This Story, Choose Your Platform!
European flag

Co-funded by the European Commission

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 761634. This website reflects the views only of the Consortium, and the Commission cannot be held responsible for any use which may be made of the information contained herein.