Knowledgebase

Presentations, Articles, Reports & Media

This page is regularly updated with new videos, presentations, documents, articles, white papers and other information from Replica Analytics. Check back often to get the latest resource downloads.

Publication

Practical Synthetic Data Generation

Podcast

Deciphering Deep Fakes & Synthetic Data

Publication

Accelerating AI with Synthetic Data

Article

10 Recommendations for Regulating Non-identifiable Data

Article

Real-World Evidence Firm Aetion Grabs Privacy-Protected Synthetic Data Provider Replica Analytics

Article

Replica Analytics Using Synthetic Data to Ease Medical Researchers’ Pain

Article

Could Synthetic Data be the Future of Data Sharing?

Evaluating the Utility of Synthetic Data

Article

Enabling Health Information Sharing with Synthetic Data

Executive Update on Synthetic Data

Podcast

Real or Fake? The Buzz About Synthetic Data

Podcast

Discussion with Dr. Khaled El Emam on synthetic data

Utility Assessments in Synthetic Data

This webinar offers an overview of typical ways of assessing utility in synthetic data as well as the findings of our recent research project. The project focused on identifying which utility assessments, that are performed without knowing how a synthetic dataset will be used, are most predictive of analytic value. We present findings to show which utility assessments can best rank synthesis methods in terms of performance on a realistic biostatistics analysis. This work was funded by multiple Canadian government research funding agencies and the Bill and Melinda Gates Foundation.

Managing and Regulating Privacy Risks in Synthetic Data

In this webinar, Khaled El Emam and Lucy Mosquera highlight the main findings of a year-long study done with the CHEO Research Institute and the University of Ottawa. Its focus is the privacy assurance use case for synthetic data and the underlying conclusions come from a technical analysis of the different types of privacy risks, a legal analysis of these risks, as well as perspectives shared by regulators.

Synthetic Data Generation 101

In this webinar, Dr. Khaled El Emam provides a general introduction to synthetic data generation (SDG), explaining what it means, how it works and what technologies are used, as well as an overview of the use cases where synthetic data can provide value in the context of real-world data and clinical trial data. 

Generating Synthetic Longitudinal Data

Longitudinal data differ from other types of data to which synthesis has been applied, such as tabular data, necessitating a novel set of approaches and technologies.  approaches that have been used to generate synthetic longitudinal data and show examples of how this works.

Synthetic Data Generation for Rare Disease Research

This webinar, presented by Dr. Khaled El Emam and Jason Colquitt, discusses the small data challenges of rare disease research and how synthetic data generation can help address these challenges. 

Measuring Re-identification Risk for Synthetic and Anonymized Data

Khaled El Emam discusses accurate methods for measuring re-identification risk. This new class of risk estimators helps avoid issues experienced with many of the current risk measurement methods.

Synthetic Data

A presentation by Dean Eurich, Professor, School of Public Health, University of Alberta. Presented at the 2021 PHUSE/FDA Computational Science Symposium (CSS), Professor Eurich discusses how synthetic data can be used to accelerate access to data for health research.

Practical Mechanisms for Generating Anonymous Data

This BIH Digital Medicine online lecture is an overview of risk-based methods for creating non-identifiable data, highlights some limitations, discusses SDG methods and their applications in practice.

Ten Recommendations For Regulating De-identification

Based on his almost two decades of experience, Khaled El Emam presents ten key recommendations on how to define the criteria for when information becomes non-identifiable, and how to regulate the resulting data.

Implementing the FAIR Data Sharing Principles

Presentation of Replica Analytics’ joint offering with Aridhia: An integrated and managed FAIR data sharing platform. 

Data Synthesis: A Tool for Responsible Data Sharing

This presentation by Khaled El Emam provides a general overview of synthetic data generation and its applications. 

Experiences Implementing Data Synthesis in a Global Life Sciences Company

A webinar from Stephen Bamford at Janssen describing their experiences with the implementation of synthetic data generation and where it fits in among other PETs.

What We Learned About Data Synthesis in 2020

In this webinar, Khaled El Emam provided his perspective on the changes in 2020 in data synthesis technology, the market, the adoption, the practical application, the governance, and the potential of this technology.

Empirical Assessment of Privacy Risks in Data

Presentation on our experiences with trying to identify individuals in datasets that have been de-personalized, especially health data. This is an overview of Motivated Intruder Attacks.

Ten Things I Have Learned About Health Data Monetization

Sharing some key experiences working with organizations to monetize their data for commercial, academic, and public interest purposes. A practical journey with many lessons learned.

Ten Things I Have Learned About De-identification

An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.

Synthetic Clinical Trial Data: Use Cases, Methods, and Experiences

Sharing experiences with the synthesis of clinical trial data and how that data has been used by a pharmaceutical company.

Optimal Synthesis of Clinical Trial Data

Presentation describing a method for the synthesis of complex clinical trial data and explaining how to tune its hyperparameters, with results presented on multiple oncology clinical trial datasets.

An Introduction to Synthetic Clinical Trial Data

An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.

2021 Summit - Synthetic Data: The Future of Data Sharing

The Alberta Synthetic Data Project

Dean Eurich
Professor, School of Public Health, University of Alberta

Lucy Mosquera
Director of Data Science,
Replica Analytics

Uses of Synthetic Data by the Life Sciences Industry

Panellists Virginie Giroux, HEOR, Merck; Stephen Bamford, Janssen; Janice Branson, Novartis; moderator Reg Joseph, Health Cities.

How Synthetic Data will Transform Health Research and Innovation

Karen Cuenco, Integration & Quantitative Science, Senior Specialist Bill and Melinda Gates Foundation

2020 Summit - Getting Access to COVID-19 Data

Responsibly Sharing COVID-19 Data

Dr. Khaled El Emam

GIS and COVID-19

Alex Miller, President, Esri Canada

The Pandemic Menace and the Future of Humanity Through the Eyes of a Realist

Gerry Stegmaier, Reed Smith

The COVID-19 Research Database

Jason LaBonte, Ph.D., Datavant

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

JMIR Medical Informatics published this study which evaluates the ability of common utility metrics to rank synthetic data generation methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.

Can Synthetic Data Be A Proxy For Real Clinical Trial Data? A Validation Study

A detailed case study demonstrating the high utility of synthetic data generation for a colon cancer clinical trial dataset.

Evaluating the Utility of Synthetic COVID-19 Case Data

A detailed case study demonstrating the high utility and low privacy risks of synthetic data generation for the Ontario COVID-19 case dataset.

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data

Outlines a privacy model that measures attribute disclosure conditional on identity disclosure for synthetic data, combining two types of disclosure risk within a single framework.

Optimizing the Synthesis of Clinical Trial Data Using Sequential Trees

This paper describes the basic method for generating synthetic data for small source datasets, such as for clinical trials.

Explore the advantages of synthetic data in your healthcare organization. Find out more about Replica Synthesis today.