Scaling Configuration

About

This guide will teach you about the various scaling options available for BioMedICUS.

Prerequisites

You should have BioMedICUS installed according to the installation guide.

Deployment Scaleout

BioMedICUS inludes a configuration file for scaling deployed processors, to write this configuration file run the following command in a terminal:

b9 write-config scaleout_deploy

Events Service Pooling

The first thing to notice is that the events service configuration has multiple addresses:

events_service:
  enabled: yes
  addresses:
    - localhost:50100
    - localhost:50101
    - localhost:50102
    - localhost:50103
    - localhost:50104
    - localhost:50105
    - localhost:50106
    - localhost:50107

In this configuration, the event service is launched 8 times and all events services are used as potential endpoints for new documents created. The deployed processors are made aware of this and use a unique identifier per events service to determine which service needs to be accessed to look up the particular event and data it needs.

This helps overcome an event service bottleneck resulting from the Python global interpreter lock. If after adding more workers to processors and the event service you are not seeing utilization scale it could be because of an event service bottleneck and you should either enable or increase the size of the events service pool by adding more addresses.

This is currently incompatible with the NGINX reverse proxy docker-compose.

Events Service Workers

The events_service.workers value is the number of threads the events service has working to respond to requests. Note that these are Python threads and Python has a Global Interpreter Lock (GIL), meaning they aren’t actually concurrently executed, and the only gains here are when a thread has to wait for io.

Processor Workers

Processors also have a number of workers to respond to requests. This is controlled by the shared_processor_config.workers setting and also the workers setting on each individual processor (example shown). The workers setting on the individual processors will override the shared setting.

  - implementation: java
    enabled: no
    entry_point: edu.umn.biomedicus.rtf.RtfProcessor
    port: 50200
    workers: 32

In Java, this determines the number of threads that are responding to document requests. In Python, it also determines the number of threads, but note that heavily compute-bound tasks won’t see much improvement because of the Python GIL.

Processor Instances

Processors also have an instances parameter. This parameter controls process-level parallelism, with the processor being launched the number of times specified by instances (example shown).

  - implementation: python
    entry_point: biomedicus.negation.deepen
    port: 50900
    instances: 4

This parameter can help overcome Python GIL-based bottlenecks. By default the instances have port numbers incremented from the first port number, but the port setting can also be replaced by a list. Note that you will need to update the run configuration to add the addresses of the additional servers.

This setting will increase memory consumption by a multiple of the number of instances since instances do not share memory and will need to load the same processors.

This is currently incompatible with the NGINX reverse proxy docker-compose.

Multiprocessing Processors

Several of our Python processors, notably biomedicus.sentences.bi_lstm and biomedicus.dependencies.stanza_selective_parser support Python multiprocessing concurrency. This uses a process pool to handle requests in addition to the thread pool. This process pool can be enabled by specifying the --mp flag in additional_args:

  - implementation: python
    entry_point: biomedicus.sentences.bi_lstm
    port: 50300
    pre_args: ['processor']
    additional_args: ['--mp']

This can help overcome bottlenecks due to the Python GIL in compute-heavy tasks. You may want to use this option if you see heavy process utilization for these specific processors and scaling workers on these or other processors does not improve overall utilization.

Pipeline Configuration

BioMedICUS also includes a scaleout configuration for the pipeline run by the b9lient run command. You can write this configuration to disk with the following command:

b9client write-config scaleout-pipeline

Workers and Read-ahead

The mp_config.workers is the number of processes that will independently move events/documents through the pipeline, the concurrency level for the pipeline. The mp_config.read_ahead is the maximum number of documents the source thread should prepare ahead of time, i.e. read into memory. The read ahead setting helps prevent blocking from a worker process having to wait for documents to be read when it needs them.

Specifying multi-instance processors

If you enabled multiple instances of a processor above and wish to use them, you can specify the addresses via a comma-separated list, example:

  - name: biomedicus-deepen
    address: localhost:50900,localhost:50901,localhost:50902,localhost:50903