Replica Analytics - An Aetion Company

Conference: Phuse Data Transparency Winter Event 2023

Conference: Phuse Data Transparency Winter Event 2023

This event explores the need for transparent data in the clinical development arena, with thought-provoking presentations and panel discussions from world experts in data sharing. Replica Analytics is a proud participant. The company’s SVP and GM, Dr. Khaled El Emam, will deliver an informative presentation titled Evaluation of the Privacy Risks in Synthetic Clinical Trial Datasets.

There is growing interest in using synthetic data generation (SDG) methods to share clinical trial datasets for secondary analysis in a privacy-preserving manner. This secondary analysis can be for internal purposes, or for external data disclosures to collaborators. One common measure of disclosure risk is membership disclosure. This evaluates the ability of an adversary to determine that a target individual was in the original (training) dataset. For example, if an inclusion criterion was high disease severity then by learning that a target individual was in the original dataset, an adversary would learn their disease severity without actually identifying their record.

Recently proposed membership disclosure metrics can be used to assess the relative risk (compared to a naïve adversary) and the absolute risk. The former is an F1 score, which assesses the accuracy of an adversary’s ability to correctly predict that a target individual is in the original data. This considers the incremental risk introduced by SDG. The latter reflects the absolute probability of membership disclosure. For example, a synthetic dataset may introduce very little incremental risk to a naïve attack, but the absolute risk may still be high, and vice versa.

This talk provides an overview of how to evaluate membership disclosure risk for synthetic data, and it presents results from an evaluation of 12 synthetic oncology clinical trial datasets. Seven trials were from Project Data Sphere and five from the Ottawa Hospital. Both types of membership disclosure metrics were evaluated. The results demonstrate the disclosure risks, and their variation, that one may expect to see in synthetic oncology trial data. Based on these results, recommendations are provided on the broader sharing of synthetic clinical trial data for internal and external secondary purposes.

There is a cost to attend this event.