About
In the final part of our guide that started with creating our own processor we will be using the deployed processor to process some documents.
Writing the pipeline configuration file
We’ll start by writing out the configuration file for the pipeline.
b9client write-config pipeline
This creates a file named biomedicus_default_pipeline.yml
which we will edit to add our processor we wrote and deployed earlier. Open that file in your favorite text editor.
This requires the BioMedICUS virtual environment created during installation to be active.
Adding our processor
Again, the change we need to make here are small, the block starting with
components:
- processor_id: biomedicus-sentences
address: localhost:50300
is a list of the processors and their addresses in the order which they will be run. If you remember from our previous tutorial, we assigned our new processor the port 52000. Now we will add it to the list of processors at the end of the file:
- processor_id: medicationsprocessor
address: localhost:52000
Running the pipeline
First, we need some documents to process, if you don’t have any you can use an MTSamples.com corpus we have made available. Download these documents and extract them to a folder.
We’re finally ready to run our processor we created along with the rest of the BioMedICUS pipeline, which you can do with the following command:
b9client run --config biomedicus_default_pipeline.yml --include-label-text INPUT_DIRECTORY -o OUTPUT_DIRECTORY
Replace the input directory with the directory where the documents you want processed are and the output directory with the directory where you want results stored.
Note this requires that the BioMedICUS processors be deployed and running in another terminal window or tab.
Viewing results
The default BioMedICUS pipeline and run command will serialize the documents as json. By default the files are not prettified, but you can do that by running the following:
python -m json.tool 97_98.txt.json
This command will print out the json file prettified, in that file you can find the “medication_sentences” label index containing examples like the following:
{
"start_index": 1354,
"end_index": 1529,
"identifier": 7,
"fields": {},
"reference_ids": {
"concepts": [
"umls_concepts:683",
"umls_concepts:684",
"umls_concepts:685",
"umls_concepts:686",
"umls_concepts:687",
"umls_concepts:688",
"umls_concepts:689",
"umls_concepts:690",
"umls_concepts:691",
"umls_concepts:692",
"umls_concepts:693",
"umls_concepts:694",
"umls_concepts:695",
"umls_concepts:696",
"umls_concepts:697",
"umls_concepts:698",
"umls_concepts:699",
"umls_concepts:700",
"umls_concepts:701",
"umls_concepts:702",
"umls_concepts:703",
"umls_concepts:704",
"umls_concepts:705"
]
},
"_text": "She was taking Remeron 15 mg q.h.s., Ambien 5 mg q.h.s. on a p.r.n. basis, Ativan 0.25 mg every 6 hours on a p.r.n. basis, and Klonopin 0.25 mg at night while she was at home."
}
You can see the start_index and end_index of where the label occurs in text, as well as a list of concepts in the sentence. Finally the text of the sentence is shown as _text
because we used the --include-label-text
flag while running.
Conclusion
This concludes our tutorial on how to create a processor that runs with BioMedICUS. Now that you have the basics down, the possibilities are endless for what you can do. Good luck!