Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Requirements

  • Python 3.7+
  • Java JDK 8+

Installing MTAP

To install mtap you will need to using something that supports Maven dependency management. We use the Gradle Build Tool. See their getting started guide and MTAP on Maven Central.

We will need to install the MTAP Python SDK, which includes the events service and pipeline necessary to run our processor.

pip install mtap

Creating a processor

We will be creating a document processor which simply writes a “hello world” label to a document.

To start, create a file named Hello.java in the main source folder (src/main/java in gradle), thi file will contain our processor:

import edu.umn.nlpie.mtap.common.*;
import edu.umn.nlpie.mtap.model.*;
import edu.umn.nlpie.mtap.processing.*;
import org.kohsuke.args4j.CmdLineException;
import org.kohsuke.args4j.CmdLineParser;

import java.io.IOException;

@Processor("hello")
public class Hello extends DocumentProcessor {
  @Override
  protected void process(Document document, JsonObject params, JsonObjectBuilder result) {
    try (Labeler<GenericLabel> labeler = document.getLabeler("hello")) {
      String text = document.getText();
      labeler.add(
          GenericLabel.withSpan(0, text.length()).setProperty("response", "Hello " + text + "!")
      );
    }
  }

  public static void main(String[] args) {
    ProcessorServer.Builder options = new ProcessorServer.Builder();
    CmdLineParser parser = new CmdLineParser(options);
    try {
      parser.parseArgument(args);
      Server server = options.build(new HelloWorldExample());
      server.start();
      server.blockUntilShutdown();
    } catch (IOException e) {
      System.err.println("Failed to start server: " + e.getMessage());
    } catch (InterruptedException e) {
      System.err.println("Server interrupted.");
    } catch (CmdLineException e) {
      ProcessorServer.Builder.printHelp(parser, HelloWorldExample.class, e, null);
    }
  }
}

Note the use of the try-with-resources statement: try () {}. This block will cause the labeler to upload any added labels to the events service when exited.

Compiling the processor

To compile the processor, check the documentation for your build system, in gradle it’s:

gradle build

Creating a task for running the processor

In gradle, a task for running the processor can be created by adding the following to the build.gradle file

task runClass(type: JavaExec) {
  classpath = sourceSets.main.runtimeClasspath

  mainClass findProperty('mainClass')

  // arguments to pass to the application
  args findProperty('args')
}

Running the processor

First, we will need to launch the python events service.

python -m mtap events -p 9090 &
gradle runClass -PmainClass=Hello -Pargs="-p 9091 -e localhost:9090

To perform processing you will either need to create a python pipeline file, or use the API Gateway.

Creating python pipeline file

To perform processing, create another file pipeline.py:

import sys

if __name__ == '__main__':
    from mtap import Document, Event, Pipeline, events_client
    from mtap import RemoteProcessor

    pipeline = Pipeline(
        RemoteProcessor(processor_name='helloprocessor', address=sys.argv[2]),
    )
    with events_client(sys.argv[1]) as client:
        with Event(event_id='1', client=client) as event:
            document = Document(document_name='name', text='YOUR NAME')
            event.add_document(document)
            pipeline.run(document)
            index = document.labels['hello']
            for label in index:
                print(label.response)

To run this pipeline type

python pipeline.py

Interacting with the events service

When you deploy the processor above, the MTAP framework wraps that processor in a gRPC service and server. When it receives a request, it uses the request parameters to instantiate the Document object that gets passed to your processor. This Document object is your interface to the events service.

For more information about the methods that are available, see the API Documentation.