Learning melody and rhythm at the same time

But before, we'll take a piece from Fastai v1 that I couldn't find in Fastai2, the Linear Decoder:

class LinearDecoder[source]

LinearDecoder(n_out:int, n_hid:int, output_p:float, tie_encoder=None, bias:bool=True) :: Module

A Linear Decoder from fastai v1.

And finally, the model:

class TheModel[source]

TheModel(pitch_len, duration_len, kind, emb_size=1000, rnn_size=1200, rnn_layers=3, dropout=0.0) :: Module

A model that learns pitch and duration through separate RNNs, merging them at the end to 'compare notes', and outputting separate predictions for each aspect.

triplets_to_input[source]

triplets_to_input(triplets:Collection[Tuple[str, float, int]], pitch_vocab, duration_vocab)

Formats a sequence of triplets as an input to the model.

# test
from fastai2.text.data import make_vocab

from testing import test_eq, path

from neuralmusic.midi import parse_midi_file, row_to_triplets
from neuralmusic.data.preprocessing import preprocess

raw_df = parse_midi_file(path("data/ff4-airship.mid"))
df, pitch_count, duration_count = preprocess(raw_df)

song = row_to_triplets(df, 0)

batch_size = 1
seq_len = 10
prompt = song[0:seq_len]

pitch_vocab = make_vocab(pitch_count, min_freq=1)
duration_vocab = make_vocab(duration_count, min_freq=1)

model = TheModel(
    pitch_len=len(pitch_vocab),
    duration_len=len(duration_vocab),
    kind="dual",
    emb_size=1000,
    rnn_size=1200,
    rnn_layers=2,
)

pitch_out, duration_out = model(triplets_to_input(prompt, pitch_vocab, duration_vocab))

test_eq(torch.Size([batch_size, seq_len, len(pitch_vocab)]), pitch_out.shape)
test_eq(torch.Size([batch_size, seq_len, len(duration_vocab)]), duration_out.shape)

model
TheModel(
  (pitch_emb): Embedding(56, 1000, padding_idx=1)
  (duration_emb): Embedding(16, 1000, padding_idx=1)
  (pitch_rnn): GRU(1000, 1200, num_layers=2)
  (duration_rnn): GRU(1000, 1200, num_layers=2)
  (pitch_dec): LinearDecoder(
    (decoder): Linear(in_features=2400, out_features=56, bias=True)
    (output_dp): RNNDropout()
  )
  (duration_dec): LinearDecoder(
    (decoder): Linear(in_features=2400, out_features=16, bias=True)
    (output_dp): RNNDropout()
  )
)

Prediction

To predict notes from a prompt (a sequence of triplets to prime the model), we'll need a couple more functions.

choose[source]

choose(top_k, logits, vocab)

Chooses between the top K probabilities, and returns a single random choice.

predict[source]

predict(device, model, prompt, pitch_vocab, duration_vocab, top_k=5, n_notes=4)

Predicts the next n notes given a model and a prompt.

# test
predicted = predict(
    torch.device("cpu"), model, prompt, pitch_vocab, duration_vocab, top_k=1, n_notes=5
)
pitch, duration = predicted[0]

pitch, duration, pitch_vocab.index(pitch), duration_vocab.index(duration)
('0.4', 'Dotted▁Eighth', 28, 10)

get_model[source]

get_model(cfg:OmegaConf, pitch_vocab:Collection[str], duration_vocab:Collection[str])

Constructs the model (and puts it in the GPU if available).