Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Data Model

The following article generally describes how documents and the artifacts created during processing are represented in the MTAP Framework.

Event

The top level data object is an Event. An event’s primary function is to store documents, it functions as a dictionary / mapping of names to documents. Events are stored on a service called the “events service” which stores and provides the data associated with an event for the lifetime of the event. The event is primarily identified by a unique event_id.

Document

A Document is text and a mapping of names to LabelIndex objects.

Label

A label is span of text that the system has assigned some kind of importance or meaning. Examples could be sentences, part of speech tags, named entities, sections, identified concepts, or higher level semantic structures.

LabelIndex

A LabelIndex is a set of meaningful locations in text and associated properties. It provides filtering and navigation of its labels, which are stored as an array sorted by the label’s location in text.

Using the event service, MTAP allows processors to retrieve and use labels created by upstream components.

For concurrency reasons, label indices are immutable once added to a document and a document’s text cannot be modified.

Python

for sentence in document.get_label_index('sentences'):
  # do work on the sentence

Java

for (GenericLabel sentence : document.getLabelIndex("sentences")) {
  // do work on the sentence
}

Labeler

The Labeler is an object or function that is used to create a new label index on the document. It collects labels from the processing component and uploads them to the events service.

Python

with document.labeler('sentences') as sentence:
  for sentence in detect_sentences(document.text):
    sentence(sentence.start_index, sentence.end_index)

Java

try (Labeler<GenericLabel> sentencesLabeler = document.getLabeler("sentences")) {
      for (Span span : SentenceDetector.detectSentences(document.getText())) {
        sentencesLabeler.add(
          GenericLabel.newBuilder(span.start(), span.end()).build()
        );
      }
    }

After their context manager (with in Python) or try-with-resources (try in Java) blocks are exited the labelers will automatically finalize and upload the labels to the events service for usage in downstream components.

Events Service

The events service is a gRPC service that provides the interface for storing and retrieving events and their associated data during processing. The events service is used between processors to share artifacts of text processing.

Most of the interaction with the events service is abstracted by implementations of the Document and Event classes by the respective Python and Java frameworks, but as a gRPC service the events service can be interacted with at a lower level from any language that has a gRPC implementation. MTAP also provides a RESTful API gateway that can be used to make REST requests against the Events Service.