Abstract: Synthetic Data Generation (SDG) is most powerful when created alongside a defined use-case and with clarity around the fidelity, privacy, and fairness of the data generated. However, the right balance of these factors is unclear (especially for healthcare data) and often synthetic data projects fail due to low confidence in using the generated data as one or more of these factors are ill-defined. This talk will discuss the range of use-cases in healthcare where synthetic data could have the most impact and the need to build in explainability to the generation process to ensure confidence in the data. We’ll highlight our current successes and failures in generating synthetic data and the learning from both. Finally, we’ll highlight areas which require further development and research and the associated opportunities.
Abstract: Real world data provides immense opportunity to facilitate innovative health research, however data access remains challenging due to privacy constraints. One strategy to facilitate responsible sharing of health data is synthetic data generation. This presentation will describe the challenges of synthesizing complex longitudinal health data and introduce a deep learning model designed to address these challenges. The performance of this model will be illustrated using a recent case study of synthesizing real world data from a single payer system. The suitability of the synthetic data will be assessed using privacy metrics, generic utility metrics, and by comparing the analytic results for a specific analysis.