Addressing uncertainty in identifying pregnancies in the English CPRD GOLD Pregnancy Register: a methodological study using a worked example
Li Y., Kurinczuk JJ., Alderdice F., Quigley MA., Rivero-Arias O., Sanders J., Kenyon S., Siassakos D., Parekh N., De Almeida S., Carson C.
Introduction Electronic health records are invaluable for pregnancy-related studies. The Clinical Practice Research Datalink (CPRD) Pregnancy Register (PR) identifies pregnancies in primary care records, including uncertain cases. Objectives This paper outlines a method to reduce uncertainty in identifying pregnancies within CPRD GOLD PR data, exemplified through a study investigating the provision of pre-pregnancy care. Methods We used CPRD Mother Baby Link (MBL) and Maternity Hospital Episode Statistics (HES) to clean and augment the CPRD PR data. The study included all women aged 18-48yrs, registered at an English GP practice within CPRD on 01/01/2017, with a year of prior registration and eligibility for hospital data linkage. We developed a cleaning and combining algorithm and further applied strict data quality criteria to form three populations: ‘as provided’, ‘derived’ (using our algorithm) and ‘strictly derived’ (with stricter data quality criteria). We compared characteristics and outcomes across these populations, examining potential biases in effect estimates using the ‘as provided’ population. Results Our algorithm added 22,270 (∼7%) pregnancies from hospital data to the CPRD PR (1997-2021), eliminated conflicting pregnancies and pregnancies with unknown outcomes, and minimised potentially non-contemporaneous records of past pregnancies or partial records of pregnancies. For all pregnancies across women’s reproductive history, in the ‘strictly derived’ population, characterised by better data quality, a higher prevalence of pre-existing medical conditions and increased pre-pregnancy care were observed. In this dataset, recording of both exposure and outcome was better, and the magnitude of the association between exposure and outcome was reduced compared to the ‘as provided’ population. Conclusion PR data requires cleaning before use. This study presents a pragmatic and practical method to identify pregnancies using existing CPRD data and linked records, without needing additional data. Researchers should carefully consider their studies’ specific requirements and may adapt our proposed methodology accordingly to align with their research questions.