Components

Default BioMedICUS pipeline

By default BioMedICUS runs the following components in the following order:

Other processors

By default, normalization is run as part of the concept detector, but it can also be deployed as a standalone processor.

In addition, BioMedICUS provides functionality for transforming RTF documents into plaintext documents as input for the system via the RTF Reader.

RTF Reader

Processes an RTF document into plaintext.

Parameters
  • Name
    binary_data_name
    Description
    The name of the event binary data containing the RTF-encoded document.Defaults to "rtf"
    Data Type
    str
    Required
    false
  • Name
    output_document_name
    Description
    The name of the document to create to hold the plaintext.Defaults to "plaintext"
    Data Type
    str
    Required
    false
Output Label Indices
  • Index: bold

    Rtf bold formatting.

  • Index: italic

    Rtf italic formatting.

  • Index: underlined

    Rtf underlined formatting.

Sentence Detector

Labels sentences given document text.

Output Label Indices
  • Index: sentences

One per Line Sentences

Labels sentences where each line in the input document is a sentence.

Output Label Indices
  • Index: sentences

TnT Part of Speech Tagger

Labels part of speech tags on the document.

Parameters
  • Name
    sentences_index
    Description
    The name of the index containing sentences. Defaults to "sentences"
    Data Type
    str
    Required
    false
  • Name
    target_index
    Description
    The target index to create for POS tags. Defaults to "pos_tags"
    Data Type
    str
    Required
    false
  • Name
    token_index
    Description
    The name of an index of tokens. By default the processor will do tokenization on its own.
    Data Type
    str
    Required
    false
Input Label Indices
  • Index: sentences

    Takes name from parameter: sentences_index

  • Index:

    Takes name from parameter: token_index

    Existing tokens to use. Otherwise will tokenize each sentence.

Output Label Indices
  • Index: pos_tags

    Takes name from parameter: target_index

    Labeled part of speech tags on tokens.

    Properties

    tag (str) : The penn-treebank tag for the token.

Acronym Detector

Labels acronyms.

Parameters
  • Name
    labelOtherSenses
    Description
    Whether the non-highest scoring acronym disambiguations should be labeled
    Data Type
    bool
    Required
    false
Input Label Indices
  • Index: pos_tags

Output Label Indices
  • Index: acronyms

    The highest scoring acronym disambiguation for an acronym.

    Properties

    score (float) : The acronym's score.

    expansion (str) : The acronym's expansion.

  • Index: other_acronym_senses

    The non-highest-scoring disambiguations.

    Properties

    score (float) : The acronym's score.

    expansion (str) : The acronym's expansion.

SPECIALIST Normalizer

Labels norm forms for words.

Input Label Indices
  • Index: pos_tags

Output Label Indices
  • Index: norm_forms

    The labeled normalized form of a word per token.

    Properties

    norm (str) : The normal form of the word.

UMLS Concept Detector

Labels UMLS Concepts

Input Label Indices
  • Index: sentences

  • Index: pos_tags

  • Index: norm_forms

  • Index: acronyms

Output Label Indices
  • Index: umls_concepts

    The UMLS concepts that appear in the text.

    Properties

    sui (str) : The UMLS Source Unique Identifier for the concept.

    cui (str) : The UMLS Concept Unique Identifier

    tui (str) : The UMLS Type Unique Identifier

    source (str) : The UMLS source vocabulary the concept originated from

    score (float) : A score for the concept, direct phrase matches are highest, lowest are the normalized bag of words matches.

  • Index: umls_terms

    Text that is covered by one or more concepts.

BioMedICUS Modification Detector

Detects Historical, Negated, and Uncertain modifications on umls terms

Parameters
  • Name
    terms_index
    Data Type
    str
    Required
    false
Input Label Indices
  • Index: sentences

  • Index: pos_tags

  • Index: umls_terms

    Takes name from parameter: terms_index

Output Label Indices
  • Index: negated

    Spans of negated terms.

  • Index: uncertain

    Spans of terms that are uncertain.

  • Index: historical

    Spans of terms that are historical.

Negex Negation Detector

Detects which UMLS terms are negated.

Parameters
  • Name
    terms_index
    Description
    The label index containing terms that should be checked for negation
    Data Type
    str
    Required
    false
Input Label Indices
  • Index: sentences

  • Index: umls_terms

    Takes name from parameter: terms_index

Output Label Indices
  • Index: negated

    Spans of negated terms.

  • Index: negation_trigger

    Spans of phrases that trigger negation.