Creating a Pop Music Generator with the Transformer


TLDR; Train a Deep Learning model to generate pop music. You can compose music with our pre-trained model here — Source code is available here —

In this post, I’m going to explain how to train a deep learning model to generate pop music. This is Part I of the “Building An A.I. Music Generator” series.

Quick Note: There’s a couple of ways to generate music. The non-trivial way is to generate the actual sound waves (WaveNet, MelNet). The other is to generate music notation for an instrument to play (similar to sheet music). I’ll be explaining how to do the latter.

Ok! Here are some cool music generated examples for you.

I’ve built the website: MusicAutobot. It’s powered by the music model we’ll be building in this post.

Song #1 (Inspired by Richie Valen — La Bamba)

Red Notes are generated by the model. Green/White notes are the original

Song #2 (Inspired by Pachelbel — Canon in D)

Feel free to press the red button and generate your own music!

Note: MusicAutobot is best viewed on desktop. For those of you on mobile, you’ll just have to listen below.

Generated Part starts at 0:03
Generated Part starts at 0:06

The Transformer architecture is a recent advance in NLP, which has produced amazing results in generating text. Transformers train faster and have much better long term memory than previous language models. Give it a few words, and it can continue generating whole paragraphs that make even more sense than this one.

Naturally, this seems like a great fit for generating music. Why not give it a few notes instead, and have it generate a melody?

That’s exactly what we’ll be doing. We’re going to be using a transformer to predict music notes!

Here’s a high level diagram of what we’re trying to do:

What we are doing, is building a sequence model for music. Take an input sequence and predict a target sequence. Whether it’s time series forecasting, music or text generation, building these models can be boiled down into two steps:

Step 1. Convert data into a sequence of tokens

Step 2. Build and train the model to predict the next token

With the help of 2 python libraries — music21 and fastai — we’ve built a simple library musicautobot that makes these two steps become relatively easy.

Step 1.

Convert data (music files) into sequence of tokens (music notes)

Take a piano sheet that looks like this:

Piano Score

And tokenize it to something like this:

xxbos xxpad n72 d2 n52 d16 n48 d16 n45 d16 xxsep d2 n71 d2 xxsep d2 n71 d2 xxsep d2 n69 d2 xxsep d2 n69 d2 xxsep d2 n67 d4 xxsep d4 n64 d1 xxsep d1 n62 d1

With musicautobot, you do it like so:

More examples in this Notebook

Note: musicautobot uses music21 behind the scenes to load music files and tokenize. Details of this conversion will be covered in the next post.

Step 2.

Build and train the model to predict the next token.

fastai has some amazingly simple training code for training language models.

If we modify the data loading to handle music files instead of text, we can reuse most of the same code.

Training our music model is now as easy as this:

Go ahead and try out this notebook to train your own!

Now let’s see if it actually works.

For the next part, I’ll be using the model I’ve pre-trained for a few days on a large MIDI database. You can directly play with the pre-trained model here.

Predicting Pop Music

Step 1: Create a short snippet of notes:

Snippet from here

Here’s what that looks like:

First few notes of La Bamba — sounds like this

Step 2: Feed it to our model:

Hyperparameters:Temperature adjusts how "creative" the predictions will be. You can control the amount of variation in the pitch and/or rhythm.TopK/TopP - filters out the lowest probability tokens. It makes sure outliers never get chosen, even if they have the tiniest bit of probability

And voila!

Here’s what it sounds like:

This is actually the first result I got, but side effects may vary. You can create your own variations here.

According to the awesome people at HookTheory — the most popular chord progression in modern music is the I — V — vi — IV chord.

You may have heard it before. It’s in a lot of pop songs.

Like every. single. pop. song.

All these songs use the I-V-vi-IV chord progression

Building that chord

Let’s test our model and see if it recognizes this chord progression. We’re going to feed it the first 3 chords “I — V — vi”, and see if it predicts the “IV” chord.

Here’s how you create the first three chords with music21:

It looks like this: [C-E-G] — [G-B-D] — [A-C-E]

Model Input: First 3 chords (I-V-vi)

Now, we take those chords and feed it to our model to predict the next one:

Here’s what we get back (3 input chords are included):

Model predicted the final chord — IV

Huzzah! The model predicted notes [F-A-C] — which is the “IV” chord.

This is exactly what we were hoping it’d predict. Looks like our music model is able to follow basic music theory and make every single pop song! Well, the chords at least.

Test it out yourself

Don’t just take my word for it. Try it out on the musicautobot:

All you have to do is press the red button.

Note: 8 times out of 10, you’ll get an “IV” chord or one of its inversions. I can’t guarantee deterministic results. Only probabilities.

That’s all there is to it!

Now you know the basic steps to training a music model.

All code shown in this post is available here:

Play and generate songs with the model we just built:
^ These are real time predictions, so please be patient!

I may have glossed over a few details. There’s more!

Part II. Practical Tips for Training a Music Model— Deep dive into music encoding and training — it’ll cover all the details I just glossed over.

Part III. Building a Multitask Music Model Train a super cool music model that can can harmonize, generate melodies, and remix songs. Next token prediction is just so… basic.

Part IV. Using a Music Bot to Remix The Chainsmokers — We’ll remix an EDM drop in Ableton with musicautobot. For pure entertainment purposes only.

Special Thanks to Jeroen Kerstens and Jeremy Howard for guidance, South Park Commons and PalapaVC for support.