Replica Analytics - An Aetion Company

Implementation of synthetic data generation in the enterprise

By Khaled El Emam | Published: April, 2023 The following article was published by OneTrust DataGuidance and can be accessed on their platform via subscription. Reprinted with permission. The first article in this series looked at what synthetic data is and how it is generated, and the second article examined the use cases of synthetic data. In […]

Why is synthetic data used?

By Khaled El Emam | Published: March, 2023 The following article was published by OneTrust DataGuidance and can be accessed on their platform via subscription. Reprinted with permission. The first article in this series on synthetic data looked at what this type of data is and how it is generated. In this article, Dr. Khaled […]

What is synthetic data and how is it generated?

By Khaled El Emam | Published: February, 2023 The following article was published by OneTrust DataGuidance and can be accessed on their platform via subscription. Reprinted with permission. Synthetic data is data that has been generated artificially, rather than being real-world data. In part one of this series on synthetic data, Dr. Khaled El Emam, […]

Replica Analytics Top 10 Round-up for 2022

By Replica staff The beginning of a new year can be a good time take stock of the one that has come to a close. It has been a fantastic year at Replica Analytics, with too many great moments to count. That being said, we have tradition of producing an annual round-up where we highlight […]

New Publication: Validating a Membership Disclosure Metric for Synthetic Health Data

By Lucy Mosquera | Posted On: October 11, 2022 We have just had a paper published in JAMIA Open, entitled Validating A Membership Disclosure Metric For Synthetic Health Data, the paper validates and demonstrates, using several large datasets, how to apply a membership disclosure metric for synthetic health data, which is important for assessing the […]

Utility Metrics for Evaluating Synthetic Data Generation Methods

Technology illustration

By Lucy Mosquera and Xi Fang  Posted On: April 8, 2022 In order to work with synthetic data, it is important to prove that the synthetic dataset maintains key relationships, trends, and patterns present in the real dataset. Several utility metrics have been proposed and used to evaluate synthetic data, however, none have been validated […]

Blog post from CHEO Research Institute

Typing on a laptop

By Replica Analytics staff  Posted On: March 24th, 2022 This week, the CHEO Research Institute wrote an article about Replica Analytics’ recent acquisition by Aetion. Replica was incubated at the CHEO Research Institute, where Replica’s Khaled El Emam heads up the Electronic Health Information Laboratory, and the University of Ottawa Faculty of Medicine, where he […]

Synthetic Data Generation for Implementing the HIPAA Expert Determination Method

Office with people walking around

By Khaled El Emam  Posted On: March 1st, 2022 The Expert Determination method is one of two methods for ensuring that a dataset is indeed de-identified described in the HIPAA Privacy Rule. The other method is Safe Harbor (see Figure 1). Once a dataset is de-identified using one of these methods, it can be used and […]

Re-identification–the wrong criterion for synthetic data

Technology illustration

By Khaled El Emam  Posted On: February 17th, 2022 Sometimes we’re asked whether individuals can be re-identified in synthetic datasets, so in this blog post we tackle that question. Remember that synthetic data are generated from machine learning models that learn the patterns and statistical properties of real data to then generate synthetic data. There’s […]