Logic Liaisons facilitate the unlocking of nationwide COVID-19 patient data

Sample code templates developed by the iTHRIV CTSA team enable rapid analytics onboarding to the National COVID Cohort Collaborative (N3C) Data Enclave...

The National COVID Cohort Collaborative (N3C) Data Enclave is an analytics platform containing clinical data from electronic health records from over 70 sites around the country. This data from SARS-CoV-2 positive patients (and matched controls) helps scientists further understand the disease, including potential risk factors, preventative factors and long-term health consequences. N3C code workbooks allow users to access, aggregate, visualize and analyze the available data for their approved project. Coding in N3C requires that researchers learn novel software interfaces and workflows, and often even new languages that support distributed computation (i.e. PySpark or RSpark). These challenges make it all the more critical for researchers to share new best-practices across the community and to create and share code templates.

N3C Domain Teams enable all interested researchers with shared interests to analyze data within the N3C Data Enclave and collaborate more efficiently in a team science environment. The N3C Logic Liaisons are a group of experienced Domain Team coders from across the country who have been actively mining the billions of rows of N3C data to extract relevant variables for their project analyses. The Logic Liaison Team is led by Johanna Loomba, Director of Informatics at the Integrated Translational Health Research Institute of Virginia (iTHRIV) CTSA and Stony Brook University's Richard Moffitt. Andrea Zhou, iTHRIV data scientist, is the principal template developer for the Logic Liaison team. This team developed a generic code workbooks with structure that is agnostic to a research domain and easy to reuse across data use requests (DURs). The workbooks was refined in the Logic Liaisons meetings where community best practice tips were added, and are now published for general use in N3C. (Instructions for access are provided below.) “The Logic Liaisons have become critical to making the connection between clinician scientists, informaticians, and the massive dataset within N3C - they are at the heart of team science AND translational science.” - Melissa Haendel, co-founder of N3C

The generic N3C code workbooks are helpful for new N3C research teams as they create a project-specific data frame of facts. The sample code automatically imports the appropriate level 3 (limited dataset) or level 2 (de-identified) patient data tables and generates commonly derived variables. The variables include patient BMI, death data, intermittent mandatory ventilation (IMV) and extracorporeal membrane oxygenation (ECMO) procedures, inpatient visits, and common comorbidities from the N3C Cohort Paper. Additional variables related to the DUR can be added by modifying this sample code (i.e. incorporating new clinical concept sets, adding new logic regarding relationships between concepts, etc.).

The READMEs provides template instructions and “Logic Liaison tips” that represent community best practices in using this complex database and working in the enclave analytics environment. The templates can be imported and quickly customized for the project through a user-friendly interface, or the N3C project team can directly edit and repurpose the underlying code. The sample code also reflects best practices and compiles efficiently, resulting in optimal computational performance which is critical when working with the enclave’s enormous dataset on shared server resources. Additionally, an optional random sampling node allows researchers to easily subset the dataset while developing new derived variables, then returning to the full sample prior to analysis. This templates will continue to be maintained and updated with new frequently used transformations, visualizations, and Logic Liaison tips. Use of the sample workbooks has accelerated N3C onboarding.

HOW TO USE THE N3C LOGIC LIAISON TEMPLATES: When successfully logged into the enclave, an icon on the home page will direct you to the Knowledge Store. Once there, the templates are easily searchable by entering “Logic Liaison” into the search bar. Templates are provided for All Patients, for COVID-confirmed, and also for data quality assessments. Once a template is selected, an attached README provides step-by-step directions as well as tips and recommended best practices developed by the Logic Liaison team.

This video provides an overview of these resources.

Click Image to Expand

To receive support or provide feedback regarding the templates, please attend the N3C Researcher Support Office Hours listed on the N3C calendar.

UVA researcher who wish to use N3C, please refer to this iTHRIV portal page for further instructions.

The Logic Liaison template development was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Numbers UL1TR003015 and U24TR002306.

integrated Translational Health Research Institute of Virginia

Logic Liaisons facilitate the unlocking of nationwide COVID-19 patient data

Click Image to Expand

integrated Translational Health
Research Institute of Virginia