MUSCIMA++ Tutorial¶

This is a tutorial for using the muscima package to work with the MUSCIMA++ dataset.

We assume you have already gone through the README and downloaded the dataset. Let’s load it.

import os
from muscima.io import read_nodes_from_file

# Change this to reflect wherever your MUSCIMA++ data lives
CROPOBJECT_DIR = os.path.join(os.environ['HOME'], 'data/MUSCIMA++/v0.9/data/cropobjects')

cropobject_fnames = [os.path.join(CROPOBJECT_DIR, f) for f in os.listdir(CROPOBJECT_DIR)]
docs = [read_nodes_from_file(f) for f in cropobject_fnames]

Let’s do something straightforward: symbol classification.

Symbol Classification¶

Let’s try to tell apart quarter notes from half notes.

However, notes are recorded as individual primitives in MUSCIMA++, so we need to extract notehead-stem pairs from the data using their relationships. Quarter notes are all full-notehead-stem pairs with no beam or flag. Half-notes are all empty-notehead-stem pairs.

After we extract the note classes, we will need to compute features for classification. To do that, we first need to “draw” the symbols in the appropriate relative positions. Then, we can extract whatever features we need.

Finally, we train a classifier and evaluate it.

Extracting notes¶

# Bear in mind that the outlinks are integers, only valid within the same document.
# Therefore, we define a function per-document, not per-dataset.

def extract_notes_from_doc(cropobjects):
    """Finds all ``(full-notehead, stem)`` pairs that form
    quarter or half notes. Returns two lists of Node tuples:
    one for quarter notes, one of half notes.

    :returns: quarter_notes, half_notes
    """
    _cropobj_dict = {c.objid: c for c in cropobjects}

    notes = []
    for c in cropobjects:
        if (c.clsname == 'noteheadFull') or (c.clsname == 'notehead-empty'):
            _has_stem = False
            _has_beam_or_flag = False
            stem_obj = None
            for o in c.outlinks:
                _o_obj = _cropobj_dict[o]
                if _o_obj.clsname == 'stem':
                    _has_stem = True
                    stem_obj = _o_obj
                elif _o_obj.clsname == 'beam':
                    _has_beam_or_flag = True
                elif _o_obj.clsname.endswith('flag'):
                    _has_beam_or_flag = True
            if _has_stem and (not _has_beam_or_flag):
                # We also need to check against quarter-note chords.
                # Stems only have inlinks from noteheads, so checking
                # for multiple inlinks will do the trick.
                if len(stem_obj.inlinks) == 1:
                    notes.append((c, stem_obj))

    quarter_notes = [(n, s) for n, s in notes if n.clsname == 'noteheadFull']
    half_notes = [(n, s) for n, s in notes if n.clsname == 'notehead-empty']
    return quarter_notes, half_notes

qns_and_hns = [extract_notes_from_doc(cropobjects) for cropobjects in docs]

Now, we don’t need the objid anymore, so we can lump the notes from all 140 documents together.

import itertools
qns = list(itertools.chain(*[qn for qn, hn in qns_and_hns]))
hns = list(itertools.chain(*[hn for qn, hn in qns_and_hns]))

len(qns), len(hns)

(4320, 1181)

It seems that we have some 4320 isolated quarter notes and 1181 isolated half-notes in our data. Let’s create their images now.

Creating note images¶

Each notehead and stem Node has its own mask and its bounding box coordinates. We need to combine these two things, in order to create a binary image of the note.

import numpy

def get_image(cropobjects, margin=1):
    """Paste the cropobjects' mask onto a shared canvas.
    There will be a given margin of background on the edges."""

    # Get the bounding box into which all the objects fit
    top = min([c.top for c in cropobjects])
    left = min([c.left for c in cropobjects])
    bottom = max([c.bottom for c in cropobjects])
    right = max([c.right for c in cropobjects])

    # Create the canvas onto which the masks will be pasted
    height = bottom - top + 2 * margin
    width = right - left + 2 * margin
    canvas = numpy.zeros((height, width), dtype='uint8')

    for c in cropobjects:
        # Get coordinates of upper left corner of the Node
        # relative to the canvas
        _pt = c.top - top + margin
        _pl = c.left - left + margin
        # We have to add the mask, so as not to overwrite
        # previous nonzeros when symbol bounding boxes overlap.
        canvas[_pt:_pt+c.height, _pl:_pl+c.width] += c.mask

    canvas[canvas > 0] = 1
    return canvas

qn_images = [get_image(qn) for qn in qns]
hn_images = [get_image(hn) for hn in hns]

Let’s visualize some of these notes, to check whether everything worked. (For this, we assume you have matplotlib. If not, you can skip this step.)

import matplotlib.pyplot as plt

def show_mask(mask):
    plt.imshow(mask, cmap='gray', interpolation='nearest')
    plt.show()

def show_masks(masks, row_length=5):
    n_masks = len(masks)
    n_rows = n_masks // row_length + 1
    n_cols = min(n_masks, row_length)
    fig = plt.figure()
    for i, mask in enumerate(masks):
        plt.subplot(n_rows, n_cols, i+1)
        plt.imshow(mask, cmap='gray', interpolation='nearest')
    # Let's remove the axis labels, they clutter the image.
    for ax in fig.axes:
        ax.set_yticklabels([])
        ax.set_xticklabels([])
        ax.set_yticks([])
        ax.set_xticks([])
    plt.show()

show_masks(qn_images[:25])
show_masks(hn_images[:25])

It seems that the extraction went all right.

Feature Extraction¶

Now, we need to somehow turn the note images into classifier inputs.

Let’s get some inspiration from the setup of the HOMUS dataset. In their baseline classification experiments, the authors just resized their images to 20x20. For notes, however, this may not be such a good idea, because it will make them too short. Let’s instead resize to 40x10.

from skimage.transform import resize

qn_resized = [resize(qn, (40, 10)) for qn in qn_images]
hn_resized = [resize(hn, (40, 10)) for hn in hn_images]

# And re-binarize, to compensate for interpolation effects
for qn in qn_resized:
    qn[qn > 0] = 1
for hn in hn_resized:
    hn[hn > 0] = 1

How do the resized notes look?

show_masks(qn_resized[:25])
show_masks(hn_resized[-25:])

Classification¶

We now need to add the output labels and make a train-dev-test split out of this.

Let’s make a balanced dataset, to keep things simpler.

# Randomly pick an equal number of quarter-notes.
n_hn = len(hn_resized)
import random
random.shuffle(qn_resized)
qn_selected = qn_resized[:n_hn]

Now, create the output labels and merge the data into one dataset.

Q_LABEL = 1
H_LABEL = 0

qn_labels = [Q_LABEL for _ in qn_selected]
hn_labels = [H_LABEL for _ in hn_resized]



notes = qn_selected + hn_resized
# Flatten data
notes_flattened = [n.flatten() for n in notes]
labels = qn_labels + hn_labels

Let’s use the sklearn package for experimental setup. Normally, we would do cross-validation on data of this small size, but for the purposes of the tutorial, we will stick to just one train/test split.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    notes_flattened, labels, test_size=0.25, random_state=42,
    stratify=labels)

What could we use to classify this data? Perhaps a k-NN classifier might work.

from sklearn.neighbors import KNeighborsClassifier

K=5

# Trying the defaults first.
clf = KNeighborsClassifier(n_neighbors=K)
clf.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

Let’s run the classifier now and evaluate the results.

y_test_pred = clf.predict(X_test)

from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, target_names=['half', 'quarter']))

             precision    recall  f1-score   support

       half       0.98      0.87      0.92       296
    quarter       0.88      0.98      0.93       295

avg / total       0.93      0.93      0.93       591

NOT BAD.¶

Apparently, most mistakes happen when half-notes are classified as quarter-notes. Also, remember that we made the train/test split randomly, so there are almost certainly notes from each writer both in the test set and in the training data. This is ripe picking for the kNN classifier.

Can we perhaps quantify that effect?

…and that is beyond the scope of this tutorial.