About
We’ve made available a Docker image containing all the prerequisites and model files necessary to run BioMedICUS. Using this Docker image, you can immediately start processing documents and even extending the pipeline with your own processing components.
Pre-requisites
The following pre-requisites are required for this guide.
- Docker Engine version greater than 17.06
- You can get Docker Engine either via installing Docker Desktop or by installing the engine directly on Linux.
- You can tell your current engine version with the command
docker version
.
If you are using Docker Desktop on either macOS or Windows, the images are run using virtualization and the default memory allocation may be insufficient. BioMedICUS requires at least 10 GB of memory to run. You can change Docker Desktop’s memory allocation using the Resources page in Settings.
Getting Started
To start, create a directory that we will mount to the docker image. This directory will contain folders for both the input and the output of the BioMedICUS system, as well as any configuration changes we wish to make and additional processors we wish to run.
mkdir b9
cd b9
Running the Image
After that we can launch the docker image using the following command:
docker run -it -d -v $(pwd):/b9/ -w /b9/ --name b9 ghcr.io/nlpie/biomedicus:latest
This will start a new container with the name b9
and the folder we just created mounted on the image as /b9/
and set to the working directory (-w /b9/
).
This will take some amount of time to start all of the BioMedICUS processors, to follow its progress you may use the following command:
docker logs -f b9
When you see the following line of output, it is done deploying:
Done deploying all servers.
Processing Documents
Once the image has started and finished deploying the servers, make a directory called in
in the original directory you created and place any documents you wish to process in that directory.
mkdir in
The following command will process those documents on the image:
docker exec -it b9 b9client run --include-label-text in -o out
docker exec
runs a new command in an existing container, b9
is the container name, and b9client run --include-label-text in -o out
is the command being run. Since we mounted our folder on the image earlier and changed the working directory to that folder the in
and newly created out
folders will be accessible on the host machine.
Modifying the Pipeline
Suppose you’ve created a processor you wish to include in the BioMedICUS pipeline like in Part 1 of the Developer Tutorial. First, copy the processor, in this case called medications.py
, to the mounted directory. Next, with the docker image running, execute the following commands in a terminal window:
docker exec -it b9 b9 write-config deploy
docker exec -it b9 b9client write-config run
These commands write two files to our mounted directory that we will need to modify. First, edit the file biomedicus_deploy_config.yml
which contains the data about which processors to deploy by hosting their servers on launch. Modify the file so the end looks like this:
- implementation: java
entry_point: edu.umn.biomedicus.sections.RuleBasedSectionHeaderDetector
port: 51000
- implementation: python
entry_point: medications
port: 52000
Next edit the file biomedicus_default_pipeline.yml
which contains information about which processors to run when we process documents. Add a new component so that the end of the file looks like this:
- name: biomedicus-section-headers
address: localhost:51000
- name: medicationsprocessor
address: localhost:52000
Now shutdown and remove the b9 container
docker rm --force b9
Now to start up BioMedICUS using the modified deployment configuration run the following command:
docker run -it -d -v $(pwd):/b9/ -w /b9/ --name b9 ghcr.io/nlpie/biomedicus:latest --config biomedicus_deploy_config.yml
After the services finish launching you can process documents using the modified pipeline configuration with the following command:
docker exec -it b9 b9client run --config biomedicus_default_pipeline.yml --include-label-text in -o out
Processing RTF
From the previous section you may have noticed that you can modify the deployment command by appending arguments to the docker run
command. Using this method it is also possible to enable RTF processing:
docker run -it -d -v $(pwd):/b9/ -w /b9/ --name b9 ghcr.io/nlpie/biomedicus:latest --rtf
Or even RTF processing with a custom deployment configuration:
docker run -it -d -v $(pwd):/b9/ -w /b9/ --name b9 ghcr.io/nlpie/biomedicus:latest --rtf --config biomedicus_deploy_config.yml
To process rtf add the rtf flag to the docker exec
command to run the pipeline:
docker exec -it b9 b9client run --rtf --include-label-text in -o out
To just do RTF to text conversion, run the following after deploying using the direct above command:
docker exec -it b9 b9client run-rtf-to-text in -o out
Appendix A: Exporting the Image for Systems with Restricted Networks
Sometimes it may be necessary to run BioMedICUS on a system that does not have unrestricted access to the internet, and would not be able to download the BioMedICUS image. First, after launching the BioMedICUS container on a computer at least once, you can export that container using the following command:
docker export b9 | gzip > biomedicus-latest.tgz
And then after transferring it to the server which has restricted internet access you can import as an image using the following command:
zcat biomedicus-latest.tgz | docker import - ghcr.io/nlpie/biomedicus:latest
From there, the docker run
command at the start of this guide will work.
Appendix B: Using the NGINX Reverse Proxy Docker-compose to Host BioMedICUS
In the BioMedICUS repository we make available a docker-compose configuration which allows hosting the BioMedICUS system. This configuration contains a NGINX reverse proxy which routes processing requests to their respective services from a single port. To run, download the files in that directory and execute the following command:
docker compose up
Then anyone on that computer who has installed the minimal biomedicus_client
python package can run the biomedicus pipeline against those running services:
b9client run --address 127.0.0.1:8080 ~/in -o ~/out