mung.node module

class mung.node.Node(id_: int, class_name: str, top: int, left: int, width: int, height: int, outlinks: List[int] = None, inlinks: List[int] = None, mask: ndarray = None, dataset: str = None, document: str = None, data=None)[source]

Bases: object

One annotated object.

The Node represents one instance of an annotation. It implements the following attributes:

  • node_id: the unique number of the given annotation instance in the set of annotations encoded in the containing NodeList.

  • dataset: the name of the dataset this Node belongs to, e.g., MUSCIMA++_2.0

  • document: the name of the document this Node belongs to, e.g., CVC-MUSCIMA_W-05_N-19_D-ideal

  • class_name: the name of the label that was given to the annotation (this is the human-readable string such as notehead-full).

  • top: the vertical dimension (row) of the upper left corner pixel.

  • left: the horizontal dimension (column) of the upper left corner pixel.

  • bottom: the vertical dimension (row) of the lower right corner pixel + 1, so that you can index the corresponding image rows using img[c.top:c.bottom].

  • right: the horizontal dimension (row) of the lower right corner pixel + 1, so that you can index the corresponding image columns using img[:, c.left:c.right].

  • width: the amount of rows that the Node spans.

  • height: the amount of columns that the Node spans.

  • mask: a binary (0/1) numpy array that denotes the area within the Node’s bounding box (specified by top, left, height and width) that the Node actually occupies. If the mask is None, the object is understood to occupy the entire bounding box.

  • data: a dictionary that can be empty, or can contain anything. It is generated from the optional <Data> element of a Node.

Constructing a simple Node that consists of the “b”-like flat music notation symbol (never mind the unique_id for now):

>>> top = 10
>>> left = 15
>>> height = 10
>>> width = 4
>>> mask = numpy.array([[1, 1, 0, 0],
...                     [1, 0, 0, 0],
...                     [1, 0, 0, 0],
...                     [1, 0, 0, 0],
...                     [1, 0, 1, 1],
...                     [1, 1, 1, 1],
...                     [1, 0, 0, 1],
...                     [1, 0, 1, 1],
...                     [1, 1, 1, 0],
...                     [0, 1, 0, 0]])
>>> class_name = 'flat'
>>> dataset = 'MUSCIMA-pp_2.0'
>>> document = 'CVC-MUSCIMA_W-35_N-08_D-ideal'
>>> node = Node(611, class_name=class_name,
...                top=top, left=left, height=height, width=width,
...                inlinks=[], outlinks=[],
...                mask=mask,
...                dataset=dataset, document=document)

Nodes can also form graphs, using the following attributes:

  • outlinks: Outgoing edges. A list of integers; it is assumed they are valid node_id within the same global/doc namespace.

  • inlinks: Incoming edges. A list of integers; it is assumed they are valid node_id within the same global/doc namespace.

So far, Node graphs do not support multiple relationship types.

Unique identification

The unique_id serves to identify the Node uniquely, at least within the MUSCIMA dataset system. (We anticipate further versions of the dataset, and need to plan for that.)

To uniquely identify a Node, there are three “levels”:

  • The “global”, dataset-level identification: which dataset is this Node coming from? (For this dataset: MUSCIMA++_1.0)

  • The “local”, document-level identification: which document (within the given dataset) is this Node coming from? For MUSCIMA++ 1.0, this will usually be a string like CVC-MUSCIMA_W-35_N-08_D-ideal, derived from the filename under which the Nodes containing the given Node is stored.

  • The within-document identification, which is the node_id.

These three components are joined together into one string by a delimiter: ___

The full unique_id of a Node then might look like this:: >>> node.unique_id ‘MUSCIMA-pp_2.0___CVC-MUSCIMA_W-35_N-08_D-ideal___611’

And it consists of these three parts:

>>> node.document
'CVC-MUSCIMA_W-35_N-08_D-ideal'
>>> node.dataset
'MUSCIMA-pp_2.0'
>>> node.id
611

Nodes and images

Nodes and images are not tightly bound. This is because the same object can apply to multiple images: in the case of the CVC-MUSCIMA dataset, for example, the same Nodes are present both in the full image and in the staff-less image. The limitation here is that Nodes are based on exact pixels, so in order to retain validity, the images must correspond to each other exactly, as “layers”.

Because Nodes do not correspond to any given image, there is no facility in the data format to link them to a specific one. You have to take care of matching Node annotations to the right images by yourself.

The Node class implements some interactions with images.

To recover the area corresponding to a Node c, use:

>>> image = numpy.array([]) 
>>> if node.mask is not None: crop = image[node.top:node.bottom, node.left:node.right] * node.mask  
>>> if node.mask is None: crop = image[node.top:node.bottom, node.left:node.right]               

Because this is clunky, we have implemented the following to get the crop:

>>> crop = node.project_to(image)    

And to get the Node projected onto the entire image:

>>> crop = node.project_on(image)    

Above, note the multiplicative role of the mask: while we typically would expect the mask to be binary, in principle, this is not strictly necessary. You could supply a different mask interpration, such as probabilistic. However, we strongly advise not to misuse this feature unless you have a really good reason; remember that the Node is supposed to represent an annotation of a given image. (One possible use for a non-binary mask that we can envision is aggregating multiple annotations of the same image.)

For visualization, there is a more sophisticated method that renders the Node as a transparent colored transparent rectangle over an RGB image. (NOTE: this really changes the input image!)

>>> import matplotlib.pyplot as plt 
>>> node.render(image)           
>>> plt.imshow(image); plt.show() 

However, Node.render() currently does not support rendering the mask.

Disambiguating class names

Since the class names are present through the class_name attribute (<ClassName> element), matching the list is no longer necessary for general understanding of the file. The NodeClasses file serves as a disambiguation tool: there may be multiple annotation projects that use the same names but maybe define them differently and use different guidelines, and their respective NodeClasses allow you to interpret the symbol names correctly, in light of the corresponding set of definitions.

Note

In MUSCIMarker, the NodeClasses is currently necessary to define how Nodes are displayed: their color. (All noteheads are red, all barlines are green, etc.) The other function, matching names to clsid, has been superseeded by the class_name Node attribute.

Merging Nodes

To merge a list of Nodes into a new one, you need to:

  • Compute the new object’s bounding box: compute_unifying_bounding_box()

  • Compute the new object’s mask: compute_unifying_mask()

  • Determine the class_name and node_id of the new object.

Since node_id and class_name of merges may depend on external settings and generally cannot be reliably determined from the merged objects themselves (e.g. the merge of a notehead and a stem should be a new note symbol), you need to supply them externally. However, the bounding box and mask can be determined. The bounding box is computed simply as the smallest bounding box that encompasses all the Nodes, and the mask is an OR operation over the individual masks (or None, if the Nodes don’t have masks). Note that the merge cannot deal with a situation where only some of the objects have a mask.

Implementation notes on the mask

The mask is a numpy array that will be saved using run-length encoding. The numpy array is first flattened, then runs of successive 0’s and 1’s are encoded as e.g. 0:10 for a run of 10 zeros.

How much space does this take?

Objects tend to be relatively convex, so after flattening, we can expect more or less two runs per row (flattening is done in C order). Because each run takes (approximately) 5 characters, each mask takes roughly 5 * n_rows bytes to encode. This makes it efficient for objects wider than 5 pixels, with a compression ratio approximately n_cols / 5. (Also, the numpy array needs to be made C-contiguous for that, which explains the order='C' hack in set_mask().)

DEFAULT_DATASET = 'MUSCIMA_DEFAULT_DATASET_PLACEHOLDER'
DEFAULT_DOCUMENT = 'default-document'
UID_DELIMITER = '___'
property bottom: int

Row coordinate 1 beyond bottom right corner, so that indexing in the form img[node.top:node.bottom] is possible.

property bounding_box: Tuple[int, int, int, int]

The top, left, bottom, right tuple of the Node’s coordinates.

bounding_box_intersection(bounding_box: Tuple[int, int, int, int]) Tuple[int, int, int, int] | None[source]

Returns the sub-bounding box of this Node intersecting with the given bounding box. If the intersection is empty, returns None.

>>> node = Node(0, 'test', 10, 100, height=20, width=10)
>>> node.bounding_box
(10, 100, 30, 110)
>>> other_bbox = 20, 100, 40, 105
>>> node.bounding_box_intersection(other_bbox)
(10, 0, 20, 5)
>>> containing_bbox = 4, 55, 44, 115
>>> node.bounding_box_intersection(containing_bbox)
(0, 0, 20, 10)
>>> contained_bbox = 12, 102, 22, 108
>>> node.bounding_box_intersection(contained_bbox)
(2, 2, 12, 8)
>>> non_overlapping_bbox = 0, 0, 3, 3
>>> node.bounding_box_intersection(non_overlapping_bbox) is None
True
property class_name: str
compute_recall_precision_fscore_on_mask(other_node: Node) Tuple[float, float, float][source]

Compute the recall, precision and f-score of the predicted Node’s mask against another node’s mask.

contains(bounding_box_or_node: Tuple[int, int, int, int] | Node) bool[source]

Check if this Node entirely contains the other bounding box (or, the other node’s bounding box).

crop_to_mask()[source]

Crops itself to the minimum bounding box that contains all its pixels, as determined by its mask.

If the mask is all zeros, does not do anything, because at this point, the is_empty check should be invoked anyway in any situation where you care whether the object is empty or not (e.g. delete it after trimming).

>>> mask = numpy.zeros((20, 10))
>>> mask[5:15, 3:8] = 1
>>> node = Node(0, 'test', 10, 100, width=10, height=20, mask=mask)
>>> node.bounding_box
(10, 100, 30, 110)
>>> node.crop_to_mask()
>>> node.bounding_box
(15, 103, 25, 108)
>>> node.height, node.width
(10, 5)

Assumes integer bounds, which is ensured during Node initialization.

data_display_text() str[source]
property dataset: str
static decode_mask(mask_string: str, shape) ndarray | None[source]

Decodes a Node mask string into a binary numpy array of the given shape.

static decode_mask_bitmap(mask_string: str, shape) ndarray | None[source]

Decodes the mask array from the encoded form to the 2D numpy array.

static decode_mask_rle(mask_string: str, shape) ndarray | None[source]

Decodes the mask array from the RLE-encoded form to the 2D numpy array.

distance_to(node) Any[source]

Computes the distance between this node and another node. Their minimum vertical and horizontal distances are each taken separately, and the euclidean norm is computed from them.

property document: str
encode_data() str | None[source]
encode_mask(mode: str = 'rle') str[source]

Encode a binary array mask as a string, compliant with the Node format specification in mung.io.

static encode_mask_bitmap(mask: ndarray) str[source]

Encodes the mask array in a compact form. Returns ‘None’ if mask is None. If the mask is not None, uses the following algorithm:

  • Flatten the mask (then use width and height of Node for reshaping).

  • Record as string, with whitespace separator

  • Return resulting string

static encode_mask_rle(mask: ndarray) str[source]

Encodes the mask array in Run-Length Encoding. Instead of having the bitmap 0 0 1 1 1 0 0 0 1 1, the RLE encodes the mask as 0:2 1:3 0:3 1:2. This is much more compact.

Currently, the rows of the mask are not treated in any special way. The mask just gets flattened and then encoded.

Out of the given nodes list, return a list of those from which this node has inlinks Can deal with Nodes from multiple documents.

Out of the given nodes list, return a list of those to which this Node has outlinks. Can deal with Nodes from multiple documents.

property height: int
property id: int
join(other)[source]

Node “addition”: performs an OR on this and the other Nodes’ masks and bounding boxes, and assigns to this Node the result. Merges also the inlinks and outlinks.

Works only if the document spaces for both Nodes are the same. (Otherwise changes nothing.)

The class_name of the other is ignored.

property left: int

Column coordinate of upper left corner.

property mask: ndarray
property middle: Tuple[int, int]

Returns the integer representation of where the middle of the Node lies, as a (m_vert, m_horz) tuple.

The integers just get rounded down.

>>> node = Node(0,'', 10, 20, 30, 40)
>>> node.middle
(30, 35)
overlaps(bounding_box_or_node: Tuple[int, int, int, int] | Node) bool[source]

Check whether this Node overlaps the given bounding box or Node.

>>> node = Node(0, 'test', 10, 100, height=20, width=10)
>>> node.bounding_box
(10, 100, 30, 110)
>>> node.overlaps((10, 100, 30, 110))  # Exact match
True
>>> node.overlaps((0, 100, 8, 110))    # Row mismatch
False
>>> node.overlaps((10, 0, 30, 89))     # Column mismatch
False
>>> node.overlaps((0, 0, 8, 89))       # Total mismatch
False
>>> node.overlaps((9, 99, 31, 111))    # Encompasses Node
True
>>> node.overlaps((11, 101, 29, 109))  # Within Node
True
>>> node.overlaps((9, 101, 31, 109))   # Encompass horz., within vert.
True
>>> node.overlaps((11, 99, 29, 111))   # Encompasses vert., within horz.
True
>>> node.overlaps((11, 101, 31, 111))  # Corner within: top left
True
>>> node.overlaps((11, 99, 31, 109))   # Corner within: top right
True
>>> node.overlaps((9, 101, 29, 111))   # Corner within: bottom left
True
>>> node.overlaps((9, 99, 29, 109))    # Corner within: bottom right
True
static parse_unique_id(uid: str) -> (<class 'str'>, <class 'str'>, <class 'int'>)[source]

Parse a unique identifier. This breaks down the UID into the dataset name, document name, and id

The delimiter is expected to be ___ (kept as Node.UID_DELIMITER)

>>> Node.parse_unique_id('MUSCIMA++_2.0___CVC-MUSCIMA_W-05_N-19_D-ideal___424')
('MUSCIMA++_2.0', 'CVC-MUSCIMA_W-05_N-19_D-ideal', 424)
Returns:

global_namespace, document_namespace, id triplet. The namespaces are strings, id is an integer. If unique_id is None, returns None as id and expects it to be filled in from the caller Node instance.

project_on(image: ndarray)[source]

This function returns only those parts of the input image that correspond to the Node and masks out everything else with zeros. The dimension of the returned array is the same as of the input image. This function basically reconstructs the symbol as an indicator function over the pixels of the annotated image.

project_to(image: ndarray)[source]

This function returns the crop of the input image corresponding to the Node (incl. masking). Assumes zeros are background.

render(image: ndarray, alpha: float = 0.3, rgb: Tuple[float, float, float] = (1.0, 0.0, 0.0)) ndarray[source]

Renders itself upon the given image as a rectangle of the given color and transparency. Might help visualization.

property right: int

Column coordinate 1 beyond bottom right corner, so that indexing in the form img[:, node.left:node.right] is possible.

static round_bounding_box_to_integer(top: float, left: float, bottom: float, right: float) -> (<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>)[source]

Rounds off the Node bounds to the nearest integer so that no area is lost (e.g. bottom and right bounds are rounded up, top and left bounds are rounded down).

Returns the rounded-off integers (top, left, bottom, right) as integers.

>>> Node.round_bounding_box_to_integer(44.2, 18.9, 55.1, 92.99)
(44, 18, 56, 93)
>>> Node.round_bounding_box_to_integer(44, 18, 56, 92.99)
(44, 18, 56, 93)
scale(zoom: float = 1.0)[source]

Re-compute the Node with the given scaling factor.

set_class_name(class_name_)[source]
set_id(id_)[source]
set_mask(mask: ndarray)[source]

Sets the Node’s mask to the given array. Performs some compatibility checks: size, dtype (converts to uint8).

property top: int

Row coordinate of upper left corner.

translate(down: int = 0, right: int = 0)[source]

Move the Node down and right by the given amount of pixels.

property unique_id: str

Returns the unique_id of this Node

>>> node = Node(0, "", 0, 0, 0, 0)
>>> node.unique_id
'MUSCIMA_DEFAULT_DATASET_PLACEHOLDER___default-document___0'
property width: int
mung.node.bounding_box_dice_coefficient(first_bounding_box: Tuple[int, int, int, int], second_bounding_box: Tuple[int, int, int, int], vertical: bool = False, horizontal: bool = False) float[source]

Compute the Dice coefficient (intersection over union) for the given two bounding boxes.

Parameters:
  • vertical – If set, will only return vertical IoU.

  • horizontal – If set, will only return horizontal IoU. If both vertical and horizontal are set, will return normal IoU, as if they were both false.

mung.node.bounding_box_intersection(first_bounding_box: Tuple[int, int, int, int], second_bounding_box: Tuple[int, int, int, int]) Tuple[int, int, int, int] | None[source]

Returns the t, l, b, r coordinates of the sub-bounding box of bbox_this that is also inside bbox_other. If the bounding boxes do not overlap, returns None.

mung.node.compute_unifying_bounding_box(nodes: ~typing.List[~mung.node.Node]) -> (<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>)[source]

Computes the union bounding box of multiple nodes

mung.node.compute_unifying_mask(nodes: List[Node], intersection=False) ndarray | None[source]

Merges the masks of the given Nodes into one. Masks are combined by an OR operation.

>>> c1 = Node(0, 'name', 10, 10, 4, 1, mask=numpy.ones((1, 4), dtype='uint8'))
>>> c2 = Node(1, 'name', 11, 10, 6, 1, mask=numpy.ones((1, 6), dtype='uint8'))
>>> c3 = Node(2, 'name', 9, 14,  2, 4, mask=numpy.ones((4, 2), dtype='uint8'))
>>> nodes = [c1, c2, c3]
>>> m1 = compute_unifying_mask(nodes)
>>> m1.shape
(4, 6)
>>> print(m1)
[[0 0 0 0 1 1]
 [1 1 1 1 1 1]
 [1 1 1 1 1 1]
 [0 0 0 0 1 1]]

Mask behavior: if at least one of the Nodes has a mask, then masking behavior is activated. The masks are combined using OR: any pixel of the resulting merged Node that corresponds to a True mask pixel in one of the input Nodes will get a True mask value, all others (ie. including all intermediate areas) will get a False.

If no input Node has a mask, then the resulting Node also will not have a mask.

If some Nodes have masks and some don’t, this call with throw an error.

Parameters:
  • nodes – The list of nodes whose masks will be merged

  • intersection – Instead of a union, return the mask intersection: only those pixels which are common to all the Nodes.

mung.node.draw_nodes_on_empty_canvas(nodes: List[Node], margin: int = 10) Tuple[ndarray, Tuple[int, int]][source]

Draws all the given Nodes onto a zero background. The size of the canvas adapts to the Nodes, with the given margin.

Also returns the top left corner coordinates w.r.t. Nodes’ bounding boxes.

Add a relationship from one node to the other. Updates the nodes in-place.

If the objects are already linked, does nothing.

Collect all inlinks and outlinks of the given set of Nodes to Nodes outside of this set. The rationale for this is that these given nodes will be merged into one, so relationships within the set would become loops and disappear.

(Note that this is not sufficient to update the relationships upon a merge, because the affected Nodess outside the given set will need to have their inlinks/outlinks redirected to the new object.)

Returns:

A tuple of lists: (inlinks, outlinks)

mung.node.merge_multiple_nodes(nodes: List[Node], class_name: str, id_: int) Node[source]

Merge multiple nodes. Does not modify any of the inputs.

mung.node.merge_node_lists_from_multiple_documents(node_lists: List[List[Node]]) List[Node][source]

Combines the Node lists from different documents into one list, so that inlink/outlink references still work. This is useful only if you want to merge two documents into one (e.g., if your annotators worked on different “layers” of data, and you want to merge these annotations).

This just means shifting the id (and thus inlinks and outlinks). It is assumed the lists pertain to the same image. Uses deepcopy to avoid exposing the original lists to modification through the merged list.

Currently cannot handle precedence edges.

mung.node.merge_nodes(first_node: Node, second_node: Node, class_name: str, id_: int) Node[source]

Merge the given Nodes with respect to the other. Returns a new Node (without modifying any of the inputs).

mung.node.split_node_by_its_connected_components(node: Node, next_node_id: int) List[Node][source]

Split the Node into one object per connected component of the mask. All inlinks/outlinks are retained in all the newly created Nodes, and the old object is not changed. If there is only one connected component, the object is returned unchanged in a list with one entry.

A id must be provided at which to start numbering the newly created Nodes.

The data attribute is also retained.