Data linkage multiplies research insights across diverse healthcare sectors

Weber, G. M., Mandl, K. D. & Kohane, I. S. Finding the missing link for big biomedical data. JAMA 311, 2479–2480 (2014).
Google Scholar
Stange, K. C. The problem of fragmentation and the need for integrative solutions. Ann. Fam. Med. 7, 100–103 (2009).
Google Scholar
Cebul, R. D., Rebitzer, J. B., Taylor, L. J. & Votruba, M. E. Organizational fragmentation and care quality in the U.S healthcare system. J. Econ. Perspect. 22, 93–113 (2008).
Google Scholar
Song, J. et al. Utilization of electronic health record data to evaluate the association of urban environment on systemic lupus erythematosus symptoms. Rheumatology (Oxford). (2022).
Walunas, T. L. et al. Disease outcomes and care fragmentation among patients with systemic lupus erythematosus. Arthritis Care Res. 69, 1369–1376 (2017).
Google Scholar
Adler-Milstein, J., Bates, D. W. & Jha, A. K. A survey of health information exchange organizations in the United States: implications for meaningful use. Ann. Intern Med. 154, 666–671 (2011).
Google Scholar
Krieger, N. The US Census and the People’s Health: public health engagement from enslavement and “indians not taxed” to census tracts and health equity (1790-2018). Am. J. Public Health 109, 1092–1100 (2019).
Google Scholar
Lorkowski, J. & Pokorski, M. Medical records: a historical narrative. Biomedicines 10, 2594 (2022).
Google Scholar
Camp, C. L. et al. Patient records at Mayo Clinic: lessons learned from the first 100 patients in Dr Henry S. Plummer’s dossier model. Mayo Clin. Proc. 83, 1396–1399 (2008).
Google Scholar
Castellucci, M. Road to the Mayo Clinic: Plummer’s novel ideas transformed healthcare. Mod. Healthcare 46, H10–H12 (2016).
Dunn, H. L. Record linkage. Am. J. Public Health Nations Health 36, 1412–1416 (1946).
Google Scholar
Fellegi, I. P. & Sunter, A. B. A theory for record linkage. J. Am. Stat. Assoc. 64, 1183–1210 (1969).
Ruggles, S., Flood, S., Goeken, R., Schouweiler, M. & Sobek, M. IPUMS USA: Version 15.0 [dataset]. (IPUMS, Minneapolis, MN, 2023).
Mennemeyer, S. T., Menachemi, N., Rahurkar, S. & Ford, E. W. Impact of the HITECH Act on physicians’ adoption of electronic health records. J. Am. Med. Inform. Assoc. 23, 375–379 (2016).
Google Scholar
Cohen, M. F. Impact of the HITECH financial incentives on EHR adoption in small, physician-owned practices. Int. J. Med. Inform. 94, 143–154 (2016).
Google Scholar
Joseph, S., Sow, M., Furukawa, M. F., Posnack, S. & Chaffee, M. A. HITECH spurs EHR vendor competition and innovation, resulting in increased adoption. Am. J. Manag. Care 20, 734–740 (2014).
Google Scholar
Szarfman, A. et al. Recommendations for achieving interoperable and shareable medical data in the USA. Commun. Med. 2, 86 (2022).
Google Scholar
Wu, S. et al. Deep learning in clinical natural language processing: a methodical review. J. Am. Med. Inf. Assoc. 27, 457–470 (2020).
Google Scholar
Ong, T. C., Duca, L. M., Kahn, M. G. & Crume, T. L. A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology. J. Am. Med. Inf. Assoc. 27, 505–513 (2020).
Google Scholar
Joffe, E. et al. A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation. J. Am. Med. Inf. Assoc. 21, 97–104 (2014).
Google Scholar
Weber, S. C., Lowe, H., Das, A. & Ferris, T. A simple heuristic for blindfolded record linkage. J. Am. Med. Inf. Assoc. 19, e157–e161 (2012).
Google Scholar
Grannis, S. J., Williams, J. L., Kasthuri, S., Murray, M. & Xu, H. Evaluation of real-world referential and probabilistic patient matching to advance patient identification strategy. J. Am. Med. Inf. Assoc. 29, 1409–1415 (2022).
Google Scholar
Deng, Y. et al. Evolving availability and standardization of patient attributes for matching. Health Aff. Scholar 1, qxad047 (2023).
Google Scholar
culbertson, A. et al. The building blocks of inter-operability: a multisite analysis of patient demographic attributes available for matching. Appl. Clin. Inform. 08, 322–336 (2017).
Google Scholar
Krzyzanowski, B. & Manson, S. M. Twenty years of the health insurance portability and accountability act safe harbor provision: unsolved challenges and ways forward. JMIR Med. Inf. 10, e37756 (2022).
Google Scholar
Kum, H. C., Krishnamurthy, A., Machanavajjhala, A., Reiter, M. K. & Ahalt, S. Privacy preserving interactive record linkage (PPIRL). J. Am. Med. Inform. Assoc. 21, 212–220 (2014).
Google Scholar
Mirel, L. B., Resnick, D. M., Aram, J. & Cox, C. S. A methodological assessment of privacy preserving record linkage using survey and administrative data. Stat. J. IAOS 38, 413–421 (2022).
Google Scholar
Nguyen, L. et al. Privacy-preserving record linkage of deidentified records within a public health surveillance system: evaluation study. J. Med. Internet Res. 22, e16757 (2020).
Google Scholar
Irvine, K. et al. Real world performance of privacy preserving record linkage. Int. J. Population Data Sci. 3. (2018).
Kho, A. N. et al. Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J. Am. Med Inf. Assoc. 22, 1072–1080 (2015).
Google Scholar
Kho, A. N. et al. in Machine Learning and Knowledge Discovery in Databases. (eds Peggy Cellier & Kurt Driessens) 79-87 (Springer International Publishing, 2022).
Yang, Y. et al. Ancillary Data Record Linkage to characterize the completeness of data for the All of Us Research Program. Int. J. Popul. Data Sci. 7. (2022).
Marsolo, K. et al. Assessing the impact of privacy-preserving record linkage on record overlap and patient demographic and clinical characteristics in PCORnet(R), the National Patient-Centered Clinical Research Network. J. Am. Med Inf. Assoc. 30, 447–455 (2023).
Google Scholar
Kiernan, D. et al. Establishing a framework for privacy-preserving record linkage among electronic health record and administrative claims databases within PCORnet((R)), the National Patient-Centered Clinical Research Network. BMC Res Notes 15, 337 (2022).
Google Scholar
Sidky, H. et al. Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C). BMC Med. Res. Methodol. 23, 46 (2023).
Google Scholar
Khurshid, A. et al. Social and health information platform: piloting a standards-based, digital platform linking social determinants of health data into clinical workflows for community-wide use. Appl. Clin. Inform. 14, 883–892 (2023).
Google Scholar
Graham, R. J. et al. Real-world analysis of healthcare resource utilization by patients with X-linked myotubular myopathy (XLMTM) in the United States. Orphanet J. Rare Dis. 18, 138 (2023).
Google Scholar
Benitez, K., Loukides, G. & Malin, B. Beyond safe harbor: automatic discovery of health information de-identification policy alternatives. IHI 2010, 163–172 (2010).
Google Scholar
El Emam, K. et al. A globally optimal k-anonymity method for the de-identification of health data. J. Am. Med. Inf. Assoc. 16, 670–682 (2009).
Google Scholar
Blackport, J., Moffatt, C., Symmers, P., Bayless, P. & Gray, J. Methods and systems for monitoring a risk of re‐identification in a de‐identified database. U.S. Patent No. 11,741,262 B2 (2023). Filed July 19, 2021; issued August 29, 2023.
Baker, D. B., Kaye, J. & Terry, S. F. Governance through privacy, fairness, and respect for individuals. EGEMS 4, 1207 (2016).
Google Scholar
Bjornevik, K. et al. Longitudinal analysis reveals high prevalence of Epstein-Barr virus associated with multiple sclerosis. Science 375, 296–301 (2022).
Google Scholar
Lanz, T. V. et al. Clonally expanded B cells in multiple sclerosis bind EBV EBNA1 and GlialCAM. Nature 603, 321–327 (2022).
Google Scholar
Opie-Martin, S. et al. The SOD1-mediated ALS phenotype shows a decoupling between age of symptom onset and disease duration. Nat. Commun. 13, 6901 (2022).
Google Scholar
Benatar, M. et al. Design of a randomized, Placebo-Controlled, Phase 3 trial of tofersen initiated in clinically presymptomatic SOD1 variant carriers: the ATLAS study. Neurotherapeutics 19, 1248–1258 (2022).
Google Scholar
Afshar, M. et al. Creation of a data commons for substance misuse related health research through privacy-preserving patient record linkage between hospitals and state agencies. JAMIA Open 6, ooad092 (2023).
Google Scholar
Chin, R. F. M., Pickrell, W. O., Guelfucci, F., Martin, M. & Holland, R. Prevalence, healthcare resource utilization and mortality of Lennox-Gastaut syndrome: retrospective linkage cohort study. Seizure 91, 159–166 (2021).
Google Scholar
Pathak, A. et al. Privacy preserving record linkage for public health action: opportunities and challenges. J. Am. Med. Inform. Assoc. 31, 2605–2612 (2024).
Haendel, M. A. et al. The National COVID Cohort Collaborative (N3C): rationale, design, infrastructure, and deployment. J. Am. Med. Inf. Assoc. 28, 427–443 (2021).
Google Scholar
Ando, W. et al. Impact of overlapping risks of type 2 diabetes and obesity on coronavirus disease severity in the United States. Sci. Rep. 11, 17968 (2021).
Google Scholar
Bronstein, J. M. et al. Issues and biases in matching medicaid pregnancy episodes to vital records data: the Arkansas experience. Matern Child Health J. 13, 250–259 (2009).
Google Scholar
Cole, J. A. et al. Bupropion in pregnancy and the prevalence of congenital malformations. Pharmacoepidemiol. Drug Saf. 16, 474–484 (2007).
Google Scholar
Cole, J. A., Ephross, S. A., Cosmatos, I. S. & Walker, A. M. Paroxetine in the first trimester and the prevalence of congenital malformations. Pharmacoepidemiol. Drug Saf. 16, 1075–1085 (2007).
Google Scholar
Grzeskowiak, L. E., Gilbert, A. L. & Morrison, J. L. Methodological challenges in using routinely collected health data to investigate long-term effects of medication use during pregnancy. Ther. Adv. Drug Saf. 4, 27–37 (2013).
Google Scholar
Balan, N., Petrie, B. A. & Chen, K. T. Racial disparities in colorectal cancer care for black patients: barriers and solutions. Am. Surg. 88, 2823–2830 (2022).
Google Scholar
Hwang, C. S. Black, incarcerated, and dying: reflections on racism and inequities in health care. Ann. Intern Med. 175, 1047–1048 (2022).
Google Scholar
Lillard, J. W. Jr., Moses, K. A., Mahal, B. A. & George, D. J. Racial disparities in Black men with prostate. Cancer A Lit. Rev. Cancer. 128, 3787–3795 (2022).
Tobin, M. J. Fiftieth anniversary of uncovering the tuskegee syphilis study: the story and timeless lessons. Am. J. Respir. Crit. Care Med. 205, 1145–1158 (2022).
Google Scholar
Jarrell, R. H. Native American women and forced sterilization, 1973-1976. Caduceus 8, 45–58 (1992).
Google Scholar
Lawrence, J. The Indian Health Service and the sterilization of Native American women. Am. Indian Q 24, 400–419 (2000).
Google Scholar
Swartz, T. H. & Titanji, B. Deconstruct racism in medicine – from training to clinical trials. Nature 583, 202 (2020).
Google Scholar
Rai, T., Hinton, L., McManus, R. J. & Pope, C. What would it take to meaningfully attend to ethnicity and race in health research? Learning from a trial intervention development study. Socio. Health Illn. 44, 57–72 (2022).
Google Scholar
Shah, S. J. & Essien, U. R. Equitable representation in clinical trials: looking beyond table 1. Circ. Cardiovasc. Qual. Outcomes 15, e008726 (2022).
Google Scholar
Azizi, Z. et al. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open 11, e043497 (2021).
Google Scholar
Dasaradharami Reddy, K. & Gadekallu, T. R. A comprehensive survey on federated learning techniques for healthcare informatics. Comput. Intell. Neurosci. 2023, 8393990 (2023).
Google Scholar
van Egmond, M. B. et al. Privacy-preserving dataset combination and Lasso regression for healthcare predictions. BMC Med. Inf. Decis. Mak. 21, 266 (2021).
Google Scholar
link