Great results on audio classification with fastai library

Ethan Sutin
2 min readOct 30, 2018

--

The latest version of Jeremy Howard’s fast.ai deep learning for coders course has just begun. It utilizes the new fastai library built on top of PyTorch, and it makes it very easy to get great results with very little effort on a range of different tasks.

Jeremy shows that fastai is extremely effective at classifying images containing every day things like different breeds of pets, but how about on something less ImageNet-y, such as spectrograms for the purposes of audio classification.

Enter the UrbanSound8k dataset. The dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music.

Can we convert the audio files into spectrograms and then train a CNN to classify them? The short duration of the sounds seems like a good fit for a CNN, so let’s give it a shot.

We can use the librosa python library to generate spectrograms and save them for classification.

Samples of spectrograms generated from urban sounds audio

Now that we have our images, we can train a normal image classifier as usual (except will disable the default fastai image augmentation since it won’t make sense on spectrograms).

And the results are pretty impressive. We were quickly able to achieve a 80.5% mean accuracy across the 10 cv folds.

According to the latest publication on the dataset’s website, the state-of-the-art mean accuracy achieved was 79%. It should be noted that is with extensive audio specific augmentation, and without augmentation their top accuracy was 74%.

So even without any audio specific data augmentation, we can very quickly beat the state-of-the-art on an audio classification task. We could increase accuracy further by introducing data augmentation.

It’s pretty cool that fastai out-of-the-box can produce these kind of results even on images distant from the kind found in ImageNet!

And here’s a link to the notebook.

--

--