By continuing to use our site, you consent to the processing of cookies, user data (location information, type and version of the OS, the type and version of the browser, the type of device and the resolution of its screen, the source of where the user came from, from which site or for what advertisement, language OS and Browser, which pages are opened and to which buttons the user presses, ip-address) for the purpose of site functioning, retargeting and statistical surveys and reviews. If you do not want your data to be processed, please leave the site.

Delivering on the Promise of Synthetic Data

Presentations, Reports & White Papers

This page is regularly updated with new videos, presentations, documents, articles, white papers and other information from Replica Analytics. Check back often to get the latest resource downloads.  

Reports & White Papers


Practical Synthetic Data Generation

Accelerating AI with Synthetic Data

Evaluating the Utility of Synthetic Data

Executive Update on Synthetic Data

Presentations & Webinars


Measuring Re-identification Risk for Synthetic and Anonymized Data

September 22, 2021

Khaled El Emam discusses accurate methods for measuring re-identification risk. This new class of risk estimators helps avoid issues experienced with many of the current risk measurement methods.

Ten Recommendations For Regulating
De-identification

Sept 1, 2021

Based on his almost two decades of experience, Khaled El Emam presents ten key recommendations on how to define the criteria for when information becomes non-identifiable, and how to regulate the resulting data.

Experiences Implementing Data Synthesis in a Global Life Sciences Company

June 16, 2021

A webinar from Stephen Bamford at Janssen describing their experiences with the implementation of synthetic data generation and where it fits in among other PETs.

Synthetic Data

Sept 13, 2021

A presentation by Dean Eurich, Professor, School of Public Health, University of Alberta. Presented at the 2021 PHUSE/FDA Computational Science Symposium (CSS), Professor Eurich discusses how synthetic data can be used to accelerate access to data for health research.

Implementing the FAIR Data Sharing Principles

June 17, 2021

Presentation of Replica Analytics' joint offering with Aridhia: An integrated and managed FAIR data sharing platform. 

What We Learned About Data Synthesis in 2020

May 20, 2020

In this webinar, Khaled El Emam provided his perspective on the changes in 2020 in data synthesis technology, the market, the adoption, the practical application, the governance, and the potential of this technology.

Practical Mechanisms for Generating Anonymous Data 

Sept 2, 2021

This BIH Digital Medicine online lecture is an overview of risk-based methods for creating non-identifiable data, highlights some limitations, discusses SDG methods and their applications in practice.

Data Synthesis: A Tool for Responsible Data Sharing

June 16, 2021

This presentation by Khaled El Emam provides a general overview of synthetic data generation and its applications. 

Empirical Assessment of Privacy Risks in Data

March 25, 2020

Presentation on our experiences with trying to identify individuals in datasets that have been de-personalized, especially health data. This is an overview of Motivated Intruder Attacks.

Ten Things I Have Learned About Health Data Monetization

February 12, 2020

Sharing some key experiences working with organizations to monetize their data for commercial, academic, and public interest purposes. A practical journey with many lessons learned.

Optimal Synthesis of Clinical Trial Data

Nov 4, 2020

Presentation describing a method for the synthesis of complex clinical trial data and explaining how to tune its hyperparameters, with results presented on multiple oncology clinical trial datasets.

Ten Things I Have Learned About De-identification

January 15, 2020

An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.

An Introduction to Synthetic Clinical Trial Data

October 4, 2019

How data synthesis provides reliable sharing of clinical trial data for secondary analysis while protecting participant privacy.

Synthetic Clinical Trial Data: Use Cases, Methods, and Experiences

December 9, 2020

Sharing experiences with the synthesis of clinical trial data and how that data has been used by a pharmaceutical company.

Stephen Bamford,
Janssen R&D

Lucy Mosquera,
Replica Analytics

Replica Analytics Summits & Conferences


2021 Summit - Synthetic Data: The Future of Data Sharing

The Alberta Synthetic Data Project

July 7, 2021

Dean Eurich
Professor, School of Public Health, University of Alberta

Lucy Mosquera
Director of Data Science,
Replica Analytics

Uses of Synthetic Data by the Life Sciences Industry

July 7, 2021

Panellists Virginie Giroux, HEOR, Merck; Stephen Bamford, Janssen; Janice Branson, Novartis; moderator Reg Joseph, Health Cities.

How Synthetic Data will Transform Health Research and Innovation

July 7, 2021

Karen Cuenco, Integration & Quantitative Science, Senior Specialist
Bill and Melinda Gates Foundation

2020 Summit - Getting Access to COVID-19 Data

Responsibly Sharing COVID-19 Data

July 15, 2020

Dr. Khaled El Emam

The COVID-19 Research Database

July 15, 2020

Jason LaBonte, Ph.D., Datavant

GIS and COVID-19

July 15, 2020

Alex Miller, President, Esri Canada

The Pandemic Menace and the Future of Humanity Through the Eyes of a Realist

July 15, 2020

Gerry Stegmaier, Reed Smith

Technical Reports


Can Synthetic Data Be A Proxy For Real Clinical Trial Data? A Validation Study

April 2021

A detailed case study demonstrating the high utility of synthetic data generation for a colon cancer clinical trial dataset.

Optimizing the Synthesis of Clinical Trial Data Using Sequential Trees

November 2020

This paper describes the basic method for generating synthetic data for small source datasets, such as for clinical trials.

Evaluating the Utility of Synthetic COVID-19 Case Data

March 2021

A detailed case study demonstrating the high utility and low privacy risks of synthetic data generation for the Ontario COVID-19 case dataset.

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data

November 2020

Outlines a privacy model that measures attribute disclosure conditional on identity disclosure for synthetic data, combining two types of disclosure risk within a single framework.