Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect_medical_entities and phi should handle character vector #22

Closed
antoine-sachet opened this issue Jan 28, 2020 · 1 comment
Closed

detect_medical_entities and phi should handle character vector #22

antoine-sachet opened this issue Jan 28, 2020 · 1 comment

Comments

@antoine-sachet
Copy link
Contributor

@antoine-sachet antoine-sachet commented Jan 28, 2020

detect_medical_entities can only handle a scalar string. It should handle a character vector, like all other detect_* functions. Example code can be found in R/detect_entity.R.

Note that right now the medical stuff in the "medical" branch. Install with remotes::install_github("cloudyr/aws.comprehend", ref = "medical").

library("aws.comprehend")
medical_txt <- "Pt is 40yo mother, highschool teacher. HPI : Sleeping trouble on present dosage of Clonidine."
detect_medical_entities(medical_txt)
#>   Index BeginOffset                     Category EndOffset Id     Score
#> 1     1           6 PROTECTED_HEALTH_INFORMATION        10  0 0.9982511
#> 2     1          19 PROTECTED_HEALTH_INFORMATION        37  1 0.4113526
#> 3     1          45            MEDICAL_CONDITION        61  3 0.7587468
#> 4     1          83                   MEDICATION        92  2 0.9932888
#>                 Text                    Traits         Type
#> 1               40yo                      NULL          AGE
#> 2 highschool teacher                      NULL   PROFESSION
#> 3   Sleeping trouble SYMPTOM, 0.52603405714035      DX_NAME
#> 4          Clonidine                      NULL GENERIC_NAME

# This should work
detect_medical_entities(c(medical_txt, medical_txt))
#> Warning in comprehendHTTP(action = operation, body = bod, service =
#> "comprehendmedical"): Bad Request (HTTP 400).
#>      Index
#> [1,]     1

Created on 2020-01-28 by the reprex package (v0.3.0)

Making sure indices are consistent with the input even when no entities are found...
detect_medical_entities(c(medical_txt, "No medical entity in here", medical_txt))

@dkincaid

@antoine-sachet
Copy link
Contributor Author

@antoine-sachet antoine-sachet commented Mar 5, 2020

My bad, AWS does not provide batch operations for Comprehend Medical. This is a good reason not to implement batch processing for the detect_medical_* functions.

One can always loop on detect_medical_* to process multiple documents, taking rate limits into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.