mung.dataset module

This module acts as an abstraction over the CVC-MUSCIMA dataset.

It mostly implements utility functions, like getting the absolute path to a specific image in the CVC-MUSCIMA dataset, specified by the writer, number, distortion, and mode.

Environmental variables

  • CVC_MUSCIMA_ROOT

  • MUSCIMA_PLUSPLUS_ROOT

The dataset root environmental variables are used as default roots for retrieving the dataset files. If they are not set, you will have to supply the roots to the respective functions that manipulate these layers of MUSCIMA++.

class mung.dataset.CvcMuscimaDataset(root: str = None, validate: bool = False)[source]

Bases: object

The CvcMuscimaDataset class implements a wrapper around the CVC-MUSCIMA dataset file structure that allows easy retrieval of filenames based on the page number (1 - 20), writer number (1 - 50), distortion, and mode (full image, staffline pixels only, or non-staffline pixels only).

This functionality is defined in imfile().

DISTORTIONS = ['curvature', 'ideal', 'interrupted', 'kanungo', 'rotated', 'staffline-thickness-variation-v1', 'staffline-thickness-variation-v2', 'staffline-y-variation-v1', 'staffline-y-variation-v2', 'thickness-ratio', 'typeset-emulation', 'whitespeckles']
MODES = ['full', 'symbol', 'staff_only']
imfile(page: int, writer: int, distortion: str = 'ideal', mode: str = 'full') str[source]

Construct the path leading to the file of the CVC-MUSCIMA image with the specified page (1 - 20), writer (1 - 50), distortion (see CVC_MUSCIMA_DISTORTIONS), and mode (full, symbol, staff_only).

This is the primary interface that the CVC_MUSCIMA class provides.

validate(fail_early: bool = True)[source]

Checks whether the instantiated CVC_MUSCIMA instance really corresponds to the CVC-MUSCIMA dataset: all the 12 x 1000 expected CVC-MUSCIMA files should be present.

Parameters:

fail_early – If True, will return as soon as it encounters a missing file, if False, will keep going through all the files and find out which ones are missing. (Default: True)

Returns:

True if the dataset is OK, False if any file is missing.