Show simple item record

Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification

dc.contributor.authorHanauer, David A
dc.contributor.authorMei, Qiaozhu
dc.contributor.authorVydiswaran, V. G V
dc.contributor.authorSingh, Karandeep
dc.contributor.authorLandis-Lewis, Zach
dc.contributor.authorWeng, Chunhua
dc.date.accessioned2019-04-07T03:19:17Z
dc.date.available2019-04-07T03:19:17Z
dc.date.issued2019-04-04
dc.identifier.citationBMC Medical Informatics and Decision Making. 2019 Apr 04;19(Suppl 3):75
dc.identifier.urihttps://doi.org/10.1186/s12911-019-0784-1
dc.identifier.urihttps://hdl.handle.net/2027.42/148519
dc.description.abstractAbstract Background Numbers and numerical concepts appear frequently in free text clinical notes from electronic health records. Knowledge of the frequent lexical variations of these numerical concepts, and their accurate identification, is important for many information extraction tasks. This paper describes an analysis of the variation in how numbers and numerical concepts are represented in clinical notes. Methods We used an inverted index of approximately 100 million notes to obtain the frequency of various permutations of numbers and numerical concepts, including the use of Roman numerals, numbers spelled as English words, and invalid dates, among others. Overall, twelve types of lexical variants were analyzed. Results We found substantial variation in how these concepts were represented in the notes, including multiple data quality issues. We also demonstrate that not considering these variations could have substantial real-world implications for cohort identification tasks, with one case missing > 80% of potential patients. Conclusions Numbering within clinical notes can be variable, and not taking these variations into account could result in missing or inaccurate information for natural language processing and information retrieval tasks.
dc.titleComplexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification
dc.typeArticleen_US
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/148519/1/12911_2019_Article_784.pdf
dc.language.rfc3066en
dc.rights.holderThe Author(s).
dc.date.updated2019-04-07T03:19:18Z
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.