Logic Liaisons facilitate the unlocking of nationwide COVID-19 patient data
Sample code templates developed by the iTHRIV CTSA team enable rapid analytics onboarding to the National COVID Cohort Collaborative (N3C) Data Enclave...
The National COVID Cohort Collaborative (N3C) Data Enclave is an analytics platform containing clinical data from electronic health records from over 55 sites around the country. This data from SARS-CoV-2 positive patients (and matched controls) helps scientists further understand the disease, including potential risk factors, preventative factors and long-term health consequences. N3C code workbooks allow users to access, aggregate, visualize and analyze the available data for their approved project. Coding in N3C requires that researchers learn novel software interfaces and workflows, and often even new languages that support distributed computation (i.e. PySpark or RSpark). These challenges make it all the more critical for researchers to share new best-practices across the community and to create and share code templates.
N3C Domain Teams enable all interested researchers with shared interests to analyze data within the N3C Data Enclave and collaborate more efficiently in a team science environment. The N3C Logic Liaisons are a group of experienced Domain Team coders from across the country who have been actively mining the billions of rows of N3C data to extract relevant variables for their project analyses. The Logic Liaison Team was initiated by Johanna Loomba, Director of Informatics at the Integrated Translational Health Research Institute of Virginia (iTHRIV) CTSA, and Arti Patel, iTHRIV Data Scientist. This iTHRIV team, based out of the University of Virginia, developed a generic code workbook with a structure that is agnostic to a research domain and easy to reuse across data use requests (DURs). This workbook was refined in the Logic Liaisons meetings where community best practice tips were added, and is now published for general use in N3C. ( Instructions for access are provided below.) “The Logic Liaisons have become critical to making the connection between clinician scientists, informaticians, and the massive dataset within N3C - they are at the heart of team science AND translational science.” - Melissa Haendel, co-founder of N3C
The generic N3C code workbook is helpful for new N3C research teams as they create a project-specific data frame of facts. The sample code automatically imports the appropriate level 3 (limited dataset) or level 2 (de-identified) patient data tables and generates commonly derived variables. The variables include patient BMI, death data, intermittent mandatory ventilation (IMV) and extracorporeal membrane oxygenation (ECMO) procedures, inpatient visits, and common comorbidities from the N3C Cohort Paper. Additional variables related to the DUR can be added by modifying this sample code (i.e. incorporating new clinical concept sets, adding new logic regarding relationships between concepts, etc.).
The README provides template instructions and “Logic Liaison tips” that represent community best practices in using this complex database and working in the enclave analytics environment. The template can be imported and quickly customized for the project through a user-friendly interface, or the N3C project team can directly edit and repurpose the underlying code. The sample code also reflects best practices and compiles efficiently, resulting in optimal computational performance which is critical when working with the enclave’s enormous dataset on shared server resources. Additionally, an optional random sampling node allows researchers to easily subset the dataset while developing new derived variables, then returning to the full sample prior to analysis. This template will continue to be maintained and updated with new frequently used transformations, visualizations, and Logic Liaison tips.
Click Image to Expand
Use of this sample workbook has already accelerated new Logic Liaison onboarding in the Neurology Domain Team (also led by UVA iTHRIV informaticists and physicians) and is rapidly being adopted by other Domain Teams. It is also being used by summer interns in the UVA School of Data Science who are doing an analysis of health equity outcomes in patients with concurrent stroke and COVID-19. Several training sessions have been conducted, with the next opportunity being at the July 2 Enclave User Group session (invite link). We encourage any informaticians who are already N3C Data Enclave members or who are interested in becoming involved [link] to try using this template. Feedback and additional best practice tips are welcome as we continue to develop this resource. For questions or feedback, contact Johanna Loomba (JJL4D@hscmail.mcc.virginia.edu).
HOW TO USE THE N3C CODE TEMPLATE: When successfully logged into the enclave, an icon on the home page will direct you to the Knowledge Store. Once there, the template is easily searchable by entering “Multi-Node Template for Person Level Table” into the search bar. Once the template is selected, an attached ReadMe provides step-by-step directions as well as tips and recommended best practices developed by the Logic Liaison team.