Show simple item record

Optimal dynamic treatment regime estimation using information extraction from unstructured clinical text

dc.contributor.authorZhou, Nina
dc.contributor.authorBrook, Robert D.
dc.contributor.authorDinov, Ivo D.
dc.contributor.authorWang, Lu
dc.date.accessioned2022-05-06T17:27:54Z
dc.date.available2023-05-06 13:27:53en
dc.date.available2022-05-06T17:27:54Z
dc.date.issued2022-04
dc.identifier.citationZhou, Nina; Brook, Robert D.; Dinov, Ivo D.; Wang, Lu (2022). "Optimal dynamic treatment regime estimation using information extraction from unstructured clinical text." Biometrical Journal 64(4): 805-817.
dc.identifier.issn0323-3847
dc.identifier.issn1521-4036
dc.identifier.urihttps://hdl.handle.net/2027.42/172299
dc.description.abstractThe wide-scale adoption of electronic health records (EHRs) provides extensive information to support precision medicine and personalized health care. In addition to structured EHRs, we leverage free-text clinical information extraction (IE) techniques to estimate optimal dynamic treatment regimes (DTRs), a sequence of decision rules that dictate how to individualize treatments to patients based on treatment and covariate history. The proposed IE of patient characteristics closely resembles “The clinical Text Analysis and Knowledge Extraction System” and employs named entity recognition, boundary detection, and negation annotation. It also utilizes regular expressions to extract numerical information. Combining the proposed IE with optimal DTR estimation, we extract derived patient characteristics and use tree-based reinforcement learning (T-RL) to estimate multistage optimal DTRs. IE significantly improved the estimation in counterfactual outcome models compared to using structured EHR data alone, which often include incomplete data, data entry errors, and other potentially unobserved risk factors. Moreover, including IE in optimal DTR estimation provides larger study cohorts and a broader pool of candidate tailoring variables. We demonstrate the performance of our proposed method via simulations and an application using clinical records to guide blood pressure control treatments among critically ill patients with severe acute hypertension. This joint estimation approach improves the accuracy of identifying the optimal treatment sequence by 14–24% compared to traditional inference without using IE, based on our simulations over various scenarios. In the blood pressure control application, we successfully extracted significant blood pressure predictors that are unobserved or partially missing from structured EHR.
dc.publisherChapman and Hall/CRC
dc.publisherWiley Periodicals, Inc.
dc.subject.othercausal inference
dc.subject.otherclinical decision making
dc.subject.otherelectronic health record
dc.subject.otherprecision medicine
dc.subject.othertext mining
dc.titleOptimal dynamic treatment regime estimation using information extraction from unstructured clinical text
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelPhysics
dc.subject.hlbsecondlevelBiological Chemistry
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/172299/1/bimj2336.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/172299/2/bimj2336_am.pdf
dc.identifier.doi10.1002/bimj.202100077
dc.identifier.sourceBiometrical Journal
dc.identifier.citedreferenceSalgado, D. R., Silva, E., & Vincent, J.-L. ( 2013 ). Control of hypertension in the critically ill: A pathophysiological approach. Annals of Intensive care, 3 ( 1 ), 17.
dc.identifier.citedreferenceLaber, E. B., & Zhao, Y. Q. ( 2015 ). Tree-based methods for individualized treatment regimes. Biometrika, 102 ( 3 ), 501 – 514.
dc.identifier.citedreferenceLittle, R. J. ( 1993 ). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88 ( 421 ), 125 – 134.
dc.identifier.citedreferenceLu, M., Sadiq, S., Feaster, D. J., & Ishwaran, H. ( 2018 ). Estimating individual treatment effect in observational data using random forest methods. Journal of Computational and Graphical Statistics, 27 ( 1 ), 209 – 219.
dc.identifier.citedreferenceMahmud, A., & Feely, J. ( 2007 ). Choice of first antihypertensive: Simple as ABCD? American Journal of Hypertension, 20 ( 8 ), 923 – 927.
dc.identifier.citedreferenceMarik, P. E., & Varon, J. ( 2007 ). Hypertensive crises: Challenges and management. Chest, 131 ( 6 ), 1949 – 1962.
dc.identifier.citedreferenceMays, J. A., & Mathias, P. C. ( 2019 ). Measuring the rate of manual transcription error in outpatient point-of-care testing. Journal of the American Medical Informatics Association, 26 ( 3 ), 269 – 272.
dc.identifier.citedreferenceMurphy, S. A., van der Laan, M. J., Robins, J. M., & The Conduct Problems Prevention Research Group (CPPRG)., ( 2001 ). Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96 ( 456 ), 1410 – 1423.
dc.identifier.citedreferenceRobins, J. ( 1986 ). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling, 7 ( 9–12 ), 1393 – 1512. http://doi.org/10.1016/0270-0255(86)90088-6.
dc.identifier.citedreferenceRobins, J. M. ( 1994 ). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics - Theory and Methods, 23 ( 8 ), 2379 – 2412.
dc.identifier.citedreferenceRobins, J. M. ( 1997 ). Causal inference from complex longitudinal data. In Latent variable modeling and applications to casuality (pp. 69 – 117 ). Springer.
dc.identifier.citedreferenceRobins, J. M. ( 2004 ). Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium in Biostatistics (pp. 189 – 326 ).
dc.identifier.citedreferenceRubin, D. B. ( 2004 ). Multiple imputation for nonresponse in surveys (Vol. 81 ). Wiley.
dc.identifier.citedreferenceSavova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., & Chute, C. G. ( 2010 ). Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17 ( 5 ), 507 – 513.
dc.identifier.citedreferenceSchoolwerth, A. C., Sica, D. A., Ballermann, B. J., & Wilcox, C. S. ( 2001 ). Renal considerations in angiotensin converting enzyme inhibitor therapy: A statement for healthcare professionals from the Council on the Kidney in Cardiovascular Disease and the Council for High Blood Pressure Research of the American Heart Association. Circulation, 104 ( 16 ), 1985 – 1991.
dc.identifier.citedreferenceShafi, T. ( 2004 ). Hypertensive urgencies and emergencies. Ethnicity & Disease, 14 ( 4 ), S2 – 32.
dc.identifier.citedreferenceStekhoven, D. J., & Bühlmann, P. ( 2012 ). Missforest-non-parametric missing value imputation for mixed-type data. Bioinformatics, 28 ( 1 ), 112 – 118.
dc.identifier.citedreferenceStyron, J. F., Jois-Bilowich, P., Starling, R., Hobbs, R. E., Kontos, M. C., Pang, P. S., & Peacock, W. F. ( 2009 ). Initial emergency department systolic blood pressure predicts left ventricular systolic function in acute decompensated heart failure. Congestive Heart Failure, 15 ( 1 ), 9 – 13.
dc.identifier.citedreferenceSu, X., Tsai, C.-L., Wang, H., Nickerson, D. M., & Li, B. ( 2009 ). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10 ( 2 ), 141 – 158.
dc.identifier.citedreferenceSzczech, L. A., Granger, C. B., Dasta, J. F., Amin, A., Peacock, W. F., McCullough, P. A., Devlin, J. W., Weir, M. R., Katz, J. N., Anderson, F. A., Wyman, A., & Varon, J. ( 2010 ). Acute kidney injury and cardiovascular outcomes in acute severe hypertension. Circulation, 121 ( 20 ), 2183 – 2191.
dc.identifier.citedreferenceTao, Y., & Wang, L. ( 2017 ). Adaptive contrast weighted learning for multi-stage multi-treatment decision-making. Biometrics, 73 ( 1 ), 145 – 155.
dc.identifier.citedreferenceTao, Y., Wang, L., & Almirall, D. ( 2018 ). Tree-based reinforcement learning for estimating optimal dynamic treatment regimes. Annals of Applied Statistics, 12 ( 3 ), 1914 – 1938.
dc.identifier.citedreferenceWang, L., Rotnitzky, A., Lin, X., Millikan, R. E., & Thall, P. F. ( 2012 ). Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. Journal of the American Statistical Association, 107 ( 498 ), 493 – 508.
dc.identifier.citedreferenceWang, W., Lee, E. T., Fabsitz, R. R., Devereux, R., Best, L., Welty, T. K., and Howard, B. V. ( 2006 ). A longitudinal study of hypertension risk factors and their relation to cardiovascular disease: the Strong Heart Study. Hypertension, 47 ( 3 ), 403 – 409.
dc.identifier.citedreferenceWang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. ( 2018 ). Clinical information extraction applications: A literature review. Journal of Biomedical Informatics, 77, 34 – 49.
dc.identifier.citedreferenceWilliams, B., Poulter, N. R., Brown, M. J., Davis, M., McInnes, G. T., Potter, J. F., Sever, P. S., & Thom, S. M. ( 2004 ). British Hypertension Society guidelines for hypertension management 2004 (BHS-IV): Summary. BMJ, 328 ( 7440 ), 634 – 640.
dc.identifier.citedreferenceNational Research Council and others. ( 2010 ). Principles and methods of sensitivity analyses. In The prevention and treatment of missing data in clinical trials. National Academies Press.
dc.identifier.citedreferencePermutt, T. ( 2016 ). Sensitivity analysis for missing data in regulatory submissions. Statistics in Medicine, 35 ( 17 ), 2876 – 2879.
dc.identifier.citedreferenceBartal, M. ( 2001 ). Health effects of tobacco use and exposure. Monaldi Archives for Chest Disease, 56 ( 6 ), 545 – 554.
dc.identifier.citedreferenceCarroll, R. J. ( 1998 ). Measurement error in epidemiologic studies. Encyclopedia of Biostatistics, 3, 2491 – 2519.
dc.identifier.citedreferenceCarroll, R. J., Gail, M. H., & Lubin, J. H. ( 1993 ). Case-control studies with errors in covariates. Journal of the American Statistical Association, 88 ( 421 ), 185 – 199.
dc.identifier.citedreferenceCarroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. ( 2006 ). Measurement error in nonlinear models: A modern perspective. Chapman and Hall/CRC.
dc.identifier.citedreferenceChakraborty, B., & Murphy, S. A. ( 2014 ). Dynamic treatment regimes. Annual Review of Statistics and Its Application, 1 ( 1 ), 447 – 464. ISSN: 2326-8298. https://doi.org/10.1146/annurevstatistics-022513-115553
dc.identifier.citedreferenceChan, P. Y., Zhao, Y., Lim, S., Perlman, S. E., & McVeigh, K. H. ( 2018 ). Peer reviewed: Using calibration to reduce measurement error in prevalence estimates based on electronic health records. Preventing Chronic Disease, 15, 180371.
dc.identifier.citedreferenceDoll, R. ( 1998 ). Uncovering the effects of smoking: Historical perspective. Statistical Methods in Medical Research, 7 ( 2 ), 87 – 117.
dc.identifier.citedreferenceFoster, J. C., Taylor, J. M., & Ruberg, S. J. ( 2011 ). Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30 ( 24 ), 2867 – 2880.
dc.identifier.citedreferenceHernán, M. A., Brumback, B., & Robins, J. M. ( 2001 ). Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association, 96 ( 454 ), 440 – 448.
dc.identifier.citedreferenceJohnson, A. E. W., Pollard, T. J., Shen, L., Li-Wei, H. L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. ( 2016 ). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.
dc.identifier.citedreferenceKarttunen, L., Chanod, J.-P., Grefenstette, G., & Schille, A. ( 1996 ). Regular expressions for language engineering. Natural Language Engineering, 2 ( 4 ), 305 – 328.
dc.identifier.citedreferenceLaan, M. J. V. D., & Rubin, D. ( 2006 ). Targeted maximum likelihood learning. International Journal of Biostatistics, 2 ( 1 ), Article 11.
dc.working.doiNOen
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.