Implementation of synthetic data generation in the enterprise

By Khaled El Emam | Published: April, 2023 The following article was published by OneTrust DataGuidance and can be accessed on their platform via subscription. Reprinted with permission. The first article in this series looked at what synthetic data is and how it is generated, and the second article examined the use cases of synthetic data. In […]
Why is synthetic data used?

By Khaled El Emam | Published: March, 2023 The following article was published by OneTrust DataGuidance and can be accessed on their platform via subscription. Reprinted with permission. The first article in this series on synthetic data looked at what this type of data is and how it is generated. In this article, Dr. Khaled […]
What is synthetic data and how is it generated?

By Khaled El Emam | Published: February, 2023 The following article was published by OneTrust DataGuidance and can be accessed on their platform via subscription. Reprinted with permission. Synthetic data is data that has been generated artificially, rather than being real-world data. In part one of this series on synthetic data, Dr. Khaled El Emam, […]
Replica Analytics Top 10 Round-up for 2022

By Replica staff The beginning of a new year can be a good time take stock of the one that has come to a close. It has been a fantastic year at Replica Analytics, with too many great moments to count. That being said, we have tradition of producing an annual round-up where we highlight […]
New Publication: Validating a Membership Disclosure Metric for Synthetic Health Data

By Lucy Mosquera | Posted On: October 11, 2022 We have just had a paper published in JAMIA Open, entitled Validating A Membership Disclosure Metric For Synthetic Health Data, the paper validates and demonstrates, using several large datasets, how to apply a membership disclosure metric for synthetic health data, which is important for assessing the […]
New estimator uses synthetic data generation for more reliable evaluation of re-identification risk in datasets

By Lucy Mosquera and Abhik Das Posted on: July 5, 2022 Access to healthcare data to accelerate research remains a challenge. Even during the unprecedented Covid-19 pandemic, sharing de-identified individual-level datasets with the research community was a difficult process. One major concern among data custodians is privacy risk. However, there has been no reliable measure […]
Utility Metrics for Evaluating Synthetic Data Generation Methods

By Lucy Mosquera and Xi Fang Posted On: April 8, 2022 In order to work with synthetic data, it is important to prove that the synthetic dataset maintains key relationships, trends, and patterns present in the real dataset. Several utility metrics have been proposed and used to evaluate synthetic data, however, none have been validated […]
Blog post from CHEO Research Institute

By Replica Analytics staff Posted On: March 24th, 2022 This week, the CHEO Research Institute wrote an article about Replica Analytics’ recent acquisition by Aetion. Replica was incubated at the CHEO Research Institute, where Replica’s Khaled El Emam heads up the Electronic Health Information Laboratory, and the University of Ottawa Faculty of Medicine, where he […]
Synthetic Data Generation for Implementing the HIPAA Expert Determination Method

By Khaled El Emam Posted On: March 1st, 2022 The Expert Determination method is one of two methods for ensuring that a dataset is indeed de-identified described in the HIPAA Privacy Rule. The other method is Safe Harbor (see Figure 1). Once a dataset is de-identified using one of these methods, it can be used and […]
Re-identification–the wrong criterion for synthetic data

By Khaled El Emam Posted On: February 17th, 2022 Sometimes we’re asked whether individuals can be re-identified in synthetic datasets, so in this blog post we tackle that question. Remember that synthetic data are generated from machine learning models that learn the patterns and statistical properties of real data to then generate synthetic data. There’s […]