Meyda

Audio feature extraction for JavaScript.


What is an Audio Feature?

Often, observing and analysing an audio signal as a waveform doesn’t provide us a lot of information about its contents. An audio feature is a measurement of a particular characteristic of an audio signal, and it gives us insight into what the signal contains. Audio features can be measured by running an algorithm on an audio signal that will return a number, or a set of numbers that quantify the characteristic that the specific algorithm is intended to measure. Meyda implements a selection of standardized audio features that are used widely across a variety of music computing scenarios.

Bear in mind that by default, Meyda.extract applies a windowing function to the incoming signal using the hanning windowing function by default. If you compare the results of Meyda’s feature extraction to that of another library for the same signal, make sure that the same windowing is being applied, or the features will likely differ. To disable windowing in Meyda.extract, set Meyda.windowingFunction to ‘rect’.

Following is a list of supported features with their explanations. Unless stated otherwise, extraction algorithms have been adapted from the yaafe library.


Time-domain features

RMS

rms

To use RMS in applications where you expect a ceiling on each audio feature, we suggest that you measure examples of audio that you will run feature extraction on, identify a reasonable “maximum” to clamp your max to, and apply the Math.min function to take either the current value of rms, or your maximum threshold, whichever is lower.

ZCR

zcr

Energy

energy


Spectral Features

AmplitudeSpectrum

amplitudeSpectrum

Power Spectrum

powerSpectrum

Spectral Centroid

spectralCentroid

Spectral Flatness

spectralFlatness

Spectral Flux

spectralFlux

Spectral Slope

spectralSlope

Spectral Rolloff

spectralRolloff

Spectral Spread

spectralSpread

Spectral Skewness

spectralSkewness

Spectral Kurtosis

spectralKurtosis

Spectral Crest

spectralCrest

Chroma

chroma


Perceptual features

Loudness

loudness

Perceptual Spread

perceptualSpread

Perceptual Sharpness

perceptualSharpness

Mel-Frequency Cepstral Coefficients

mfcc


Utility extractors

Complex Spectrum

complexSpectrum

Buffer

buffer


Windowing functions

Windowing functions are used during the conversion of a signal from the time domain (i.e. air pressure over time) to the frequency domain (the phase and intensity of each sine wave that comprises the signal); a prerequisite for many of the audio features described above. Windowing functions generate an envelope of numbers between 0 and 1, and multiply these numbers pointwise with each sample in the signal buffer, making the samples at the middle of the buffer relatively louder, and making the samples at either end of the buffer relatively quieter. This smooths out the result of the conversion to the frequency domain, which makes the final audio features more consistent and less jittery.

Meyda supports 4 windowing functions, each with different characteristics. For more information on windowing, please consult this article. By default, Meyda applies the hanning window, not the rectangular window, to signals before converting them into the frequency domain.

Hanning

Meyda.windowing(signalToBeWindowed, "hanning");

Hamming

Meyda.windowing(signalToBeWindowed, "hamming");

Blackman

Meyda.windowing(signalToBeWindowed, "blackman");

Sine

Meyda.windowing(signalToBeWindowed, "sine");

Rectangular (no window)

Meyda.windowing(signalToBeWindowed, "rect");




[1] G. Loy, Musimathics: The Mathematical Foundations of Music, Volume 1. The MIT Press, 2006.

[2] B. P. Lathi, Modern Digital and Analog Communication Systems 3e Osece. Oxford University Press, 3rd ed., 1998.

[3] X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1st ed., 2001.

[4] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 28, pp. 357–366, Aug 1980.

[5] M. Grierson, “Maximilian: A cross platform c++ audio synthesis library for artists learning to program.,” in Proceedings of International Computer Music Conference, 2010.

Fork me on GitHub