Reproducible Data Science and the Importance of Social Determinants of Health: A Case Study in Underserved Populations with Opioid Use Disorder


Social determinants of health (SDOH) are the conditions and factors in which people are born, grow, live, work, and age that can influence their health outcomes. These determinants include social, economic, and environmental factors such as access to healthcare, education, employment, housing, transportation, and community resources. They also encompass social norms, cultural practices, and societal inequalities. Understanding and addressing SDOH are essential for promoting health equity and improving overall population health. The SDOH play a significant role in the pursuit of treatment for individuals with opioid use disorder (OUD) and hepatitis C virus (HCV) infection. For instance, people with OUD tend to have limited access to healthcare services, including primary care, addiction treatment, and HCV testing and treatment. These obstacles can hinder people with OUD from seeking help for their condition or accessing necessary HCV treatment. Moreover, people with OUD often face stigma and discrimination in conventional healthcare settings, which can create barriers to seeking treatment and receiving appropriate care for both OUD and HCV.  Furthermore, unstable housing situations are prevalent among individuals with OUD and financial barriers may prevent these individuals from pursuing or completing necessary medical care. Addressing these SDOH is crucial in supporting individuals with OUD in pursuing and successfully completing treatment for HCV infection.

In a new study published in the Frontiers in Medicine journal, Professor Marianthi Markatou, Dr. Oliver Kennedy, Dr. Michael Brachmann, PhD candidate Raktim Mukhopadhyay, Dr. Arpan Dharia, and Professor Andrew H. Talal at the University of Buffalo focused on the collection, integration, and effective use of clinical data to derive SDOH in underserved populations. The authors highlighted the significance of combining expertise from medical, statistical, and computer and data science domains to tackle the challenges associated with analyzing sensitive and disparate sources of clinical data. The findings can have significant policy implications. By identifying and understanding the SDOH, policymakers can develop targeted interventions and allocate resources effectively.

This research team provided an overview of the concept of reproducibility in science, focusing on its importance in computational and data-enabled research. The authors explained that reproducibility refers to obtaining consistent results using the same inputs, methods, and conditions for analysis. They discussed the types of errors that can impact reproducibility, including those related to data acquisition and management, statistical analysis, and communication. The paper emphasized the need for meticulous record-keeping, context tracking, and context-specific guardrails to ensure reproducibility. The authors introduced Vizier, a platform for data science that facilitates reproducibility through automated record-keeping and provenance tracking.

The authors emphasized the need for reliable, validated approaches to utilize SDOH data for evidence-based decision-making. They highlighted the importance of addressing societal stigmatization, collecting comprehensive data, and considering the impact of factors such as racism, unemployment, and food insecurity on health outcomes. They also stressed the cost-effectiveness of addressing social factors in preventing chronic diseases and improving population health. While the authors focused on individuals with OUD and HCV, their methodologies and approaches have broader applications. The integration of data from multiple sources and the emphasis on reproducibility can be implemented in various healthcare settings and populations. The study’s insights can inform data collection, analysis, and policy initiatives aimed at improving health outcomes for diverse communities.

In conclusion, the University at Buffalo study provides an in-depth analysis of an investigation that focused on collecting, integrating, and effectively using SDOH data from underserved populations. The authors emphasized the significance of combining expertise from medical, statistical, and computer and data science domains to address the challenges associated with analyzing sensitive clinical data. The study’s findings and methodologies have important implications for improving health outcomes and informing policy decisions in underserved populations. The new study underscores the need for reproducibility in biomedical protocols and the development of reliable approaches to utilize SDOH data for healthcare equity and improved population health.

Acknowledgement: This work was supported by a Patient-Centered Outcomes Research Institute (PCORI) Award (IHS-1507-793 31640) and partially supported by the Troup Fund of the Kaleida Health Foundation. The statements in this work are solely the responsibility of the authors and do not necessarily represent the views of PCORI, its Board of Governors or Methodology Committee.


Markatou M, Kennedy O, Brachmann M, Mukhopadhyay R, Dharia A, Talal AH. Social determinants of health derived from people with opioid use disorder: Improving data collection, integration and use with cross-domain collaboration and reproducible, data-centric, notebook-style workflows. Front Med (Lausanne). 2023 ;10:1076794. doi: 10.3389/fmed.2023.

Go To Front Med (Lausanne).