By continuing to use our site, you consent to the processing of cookies, user data (location information, type and version of the OS, the type and version of the browser, the type of device and the resolution of its screen, the source of where the user came from, from which site or for what advertisement, language OS and Browser, which pages are opened and to which buttons the user presses, ip-address) for the purpose of site functioning, retargeting and statistical surveys and reviews. If you do not want your data to be processed, please leave the site.

Presentations, Reports & White Papers

This page is regularly updated with new videos, presentations, documents, articles, white papers and other information from Replica Analytics. Check back often to get the latest resource downloads.  

Reports & White Papers


Practical Synthetic Data Generation

Accelerating AI with Synthetic Data

Evaluating the Utility of Synthetic Data

Cutter Executive Update on Synthetic Data


Presentations & Webinars


What We Learned About Data Synthesis in 2020

December 9, 2020

In this Year in Review webinar, Khaled El Emam provided his perspective on the changes in 2020 in data synthesis technology, the market, the adoption, the practical application, the governance, and the potential of this technology. He also made some predictions about what 2021 might have in store.

Webinar Video

Optimal Synthesis of Clinical Trial Data

Nov 4, 2020

Presentation describing a method for the synthesis of complex clinical trial data and explaining how to tune its hyperparameters, with results presented on multiple oncology clinical trial datasets.

Webinar Video

Synthetic Clinical Trial Data: Use Cases, Methods, and Experiences

May 20, 2020

Sharing experiences with the synthesis of clinical trial data and how that data has been used by a pharmaceutical company.

Stephen Bamford, Janssen R&D

Lucy Mosquera, Replica Analytics

Empirical Assessment of Privacy Risks in Data

March 25, 2020

Presentation on our experiences with trying to identify individuals in datasets that have been de-personalized, especially health data. This is an overview of Motivated Intruder Attacks.

Webinar Slides

Webinar Video

Ten Things I Have Learned About Health Data Monetization

February 12, 2020

Sharing some key experiences working with organizations globally to monetize their data for commercial, academic, and public interest purposes. A practical journey producing many lessons learned.

Webinar Slides

Webinar Video

Ten Things I Have Learned About De-identification

January 15, 2020

An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.

Webinar Slides

Webinar Video

An Introduction to Synthetic Clinical Trial Data

October 4, 2019

How data synthesis provides reliable sharing of clinical trial data for secondary analysis while protecting participant privacy.

Webinar Slides

Webinar Video


Technical Reports


Optimizing the Synthesis of Clinical Trial Data Using Sequential Trees

This paper describes the basic method for generating synthetic data for small source datasets, such as clinical trials.

November 2020

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation

November 2020

A privacy model is introduced and demonstrated that measures attribute disclosure conditional on identity disclosure for synthetic data. This combines these two types of disclosure risk within a single framework.

Evaluating the Utility of Synthetic COVID-19 Case Data

March 2021

A detailed case study demonstrating the high utility and low privacy risks of synthetic data generation for the Ontario COVID-19 case dataset.

Can Synthetic Data Be A Proxy For Real Clinical Trial Data? A Validation Study

April 2021

A detailed case study demonstrating the high utility of synthetic data generation for a colon cancer clinical trial dataset.