Replica Analytics - An Aetion Company

Knowledgebase

Presentations, Articles, Reports & Media

This page is regularly updated with new videos, presentations, documents, articles, white papers and other information from Replica Analytics. Check back often to get the latest resource downloads.

Security and technology illustration

Publication

Practical Synthetic Data Generation

Technology Illustration

Article

Precaution, Ethics & Risk: Perspectives on Regulating Non-identifiable Data

Security and technology illustration

Podcast

Privacy Tech Talk: Replica Analytics

Security and technology illustration

Podcast

Deciphering Deep Fakes & Synthetic Data

Woman holding phone while her face pixelates

Publication

Accelerating AI with Synthetic Data

Security and technology illustration

Article

10 Recommendations for Regulating Non-identifiable Data

Replica Analytics logo

Article

Real-World Evidence Firm Aetion Grabs Privacy-Protected Synthetic Data Provider Replica Analytics

Technology illustration

Article

Replica Analytics Using Synthetic Data to Ease Medical Researchers’ Pain

Security and technology illustration

Article

Could Synthetic Data be the Future of Data Sharing?

Security and technology illustration

Article

Seven Ways to Evaluate the Utility of Synthetic Data

Health care technology

Article

Enabling Health Information Sharing with Synthetic Data

Security and technology illustration

Report

Synthetic Data Paradigm for Using and Sharing Data

Security and technology illustration

Podcast

Real or Fake? The Buzz About Synthetic Data

Security and technology illustration

Podcast

Discussion with Dr. Khaled El Emam on synthetic data

Exploring the concept of Identifiability in Canada’s proposed Consumer Privacy Protection Act (Bill C-27)

Khaled El Emam, SVP & GM, Replica Analytics; Adam Kardash, Partner, Osler/AccessPrivacy

Co-hosted by OslerLLP/ AccessPrivacy, this webinar focused on the concepts relating to identifiability as well as the definitions for de-identification and anonymization proposed in Bill C-27, and the implications for generating, using, and disclosing non-identifiable data.

Assessing Re-identification Risk Using Synthetic Data

Lucy Mosquera, Director of Data Science, Replica Analytics

In this webinar, Replica’s Director of Data Science, Lucy Mosquera, presents a novel approach to employ synthetic data for a more accurate assessment of re-identification risks in datasets, to better manage privacy risks and enable greater data sharing in healthcare and other sectors.

Synthetic Data as a Privacy Enhancing Technology

Lucy Mosquera, Director of Data Science, Replica Analytics

This was a presentation delivered at the State of Play of De-identification Techniques Masterclass, organized by the Future of Privacy Forum, at the 15th International Computers, Privacy & Data Protection Conference in Brussels. All Masterclass presentations here.

Generating Synthetic Longitudinal Data

Lucy Mosquera, Director of Data Science, Replica Analytics

This webinar hosted by C-Path provides an introduction to synthetic data generation – what it means, how it works and what technologies are used – as well as an overview of use cases where synthetic data can provide value in the context of RWD and clinical trial data.

Utility Assessments in Synthetic Data

Lucy Mosquera & Xi Fang

This webinar explores ways to assess utility in synthetic data and shares new research. We identify which utility assessments are most predictive of analytic value and show which can best rank synthesis methods for performance on a realistic biostatistics analysis. The study was funded by multiple Canadian government research funding agencies and the Bill and Melinda Gates Foundation.

Managing and Regulating Privacy Risks in Synthetic Data

Khaled El Emam & Lucy Mosquera

In this webinar, Khaled El Emam and Lucy Mosquera highlight the main findings of a year-long study done with the CHEO Research Institute and the University of Ottawa. Its focus is the privacy assurance use case for synthetic data and the underlying conclusions come from a technical analysis of the different types of privacy risks, a legal analysis of these risks, as well as perspectives shared by regulators.

Synthetic Data Generation 101

Khaled El Emam, SVP & GM, Replica Analytics

In this webinar, Dr. Khaled El Emam provides a general introduction to synthetic data generation (SDG), explaining what it means, how it works and what technologies are used, as well as an overview of the use cases where synthetic data can provide value in the context of real-world data and clinical trial data.

Generating Synthetic Longitudinal Data

Khaled El Emam & Lucy Mosquera

Longitudinal data differ from other types of data to which synthesis has been applied, such as tabular data, necessitating a novel set of approaches and technologies.  approaches that have been used to generate synthetic longitudinal data and show examples of how this works.

Synthetic Data Generation for Rare Disease Research

Khaled El Emam & Jason Colquitt

This webinar, presented by Dr. Khaled El Emam and Jason Colquitt, discusses the small data challenges of rare disease research and how synthetic data generation can help address these challenges.

Measuring Re-identification Risk for Synthetic and Anonymized Data

Khaled El Emam, SVP & GM, Replica Analytics

Khaled El Emam discusses accurate methods for measuring re-identification risk. This new class of risk estimators helps avoid issues experienced with many of the current risk measurement methods.

Synthetic Data

Dean Eurich, Program Lead, Clinical Epidemiology, and Professor, School of Public Health

A presentation by Dean Eurich, Professor, School of Public Health, University of Alberta. Presented at the 2021 PHUSE/FDA Computational Science Symposium (CSS), Professor Eurich discusses how synthetic data can be used to accelerate access to data for health research.

Practical Mechanisms for Generating Anonymous Data

Khaled El Emam, SVP & GM, Replica Analytics

This BIH Digital Medicine online lecture is an overview of risk-based methods for creating non-identifiable data, highlights some limitations, discusses SDG methods and their applications in practice.

Ten Recommendations For Regulating De-identification

Khaled El Emam, SVP & GM, Replica Analytics

Based on his almost two decades of experience, Khaled El Emam presents ten key recommendations on how to define the criteria for when information becomes non-identifiable, and how to regulate the resulting data.

Implementing the FAIR Data Sharing Principles

Khaled El Emam, SVP & GM, Replica Analytics; David Sibbald, CEO and Co-Founder, Aridhia; Rodrigo Barnes, CTO, Aridhia

Presentation of Replica Analytics’ joint offering with Aridhia: An integrated and managed FAIR data sharing platform.

Data Synthesis: A Tool for Responsible Data Sharing

Khaled El Emam, SVP & GM, Replica Analytics

This presentation by Khaled El Emam provides a general overview of synthetic data generation and its applications.

Experiences Implementing Data Synthesis in a Global Life Sciences Company

Stephen Bamford, Head of Clinical Data Standards & Transparency; IDAR, Global Development, Janssen

A webinar from Stephen Bamford at Janssen describing their experiences with the implementation of synthetic data generation and where it fits in among other PETs.

What We Learned About Data Synthesis in 2020

Khaled El Emam, SVP & GM, Replica Analytics

In this webinar, Khaled El Emam provided his perspective on the changes in 2020 in data synthesis technology, the market, the adoption, the practical application, the governance, and the potential of this technology.

Empirical Assessment of Privacy Risks in Data

Khaled El Emam, SVP & GM, Replica Analytics; Janice Branson, Global Head of Advanced Methodology & Data Science, Novartis; Nathan Good, Principal, Good Research

Presentation on our experiences with trying to identify individuals in datasets that have been de-personalized, especially health data. This is an overview of Motivated Intruder Attacks.

Ten Things I Have Learned About Health Data Monetization

Khaled El Emam, SVP & GM, Replica Analytics

Sharing some key experiences working with organizations to monetize their data for commercial, academic, and public interest purposes. A practical journey with many lessons learned.

Ten Things I Have Learned About De-identification

Khaled El Emam, SVP & GM, Replica Analytics

An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.

Synthetic Clinical Trial Data: Use Cases, Methods, and Experiences

Stephen Bamford, Head of Clinical Data Standards & Transparency; IDAR, Global Development, Janssen

Sharing experiences with the synthesis of clinical trial data and how that data has been used by a pharmaceutical company.

Optimal Synthesis of Clinical Trial Data

Khaled El Emam, SVP & GM, Replica Analytics

Presentation describing a method for the synthesis of complex clinical trial data and explaining how to tune its hyperparameters, with results presented on multiple oncology clinical trial datasets.

An Introduction to Synthetic Clinical Trial Data

Richard Hoptroff, Founder and CTO, Hoptroff; Lucy Mosquera, Director Data Science, Replica Analytics; Rebecca Li, Executive Director, Vivli; Ben Szekely, Director of Product, Cambridge Semantics

An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.

2022 Symposium: Perspectives on Regulating Privacy Enhancing Technologies

Security and technology illustration

A Practical Introduction to Synthetic Data Generation

Khaled El Emam, University of Ottawa and Replica Analytics

De-identification Standardization Efforts by CANON

Adam Kardash, Osler/Access Privacy & CANON

Regulating
Non-identifiable Data
and Synthetic Data

Vance Lockton, Office of the Information and Privacy Commissioner of Ontario

2021 Summit - Synthetic Data: The Future of Data Sharing

Security and technology illustration

The Alberta Synthetic Data Project

Dean Eurich, Professor, School of Public Health, University of Alberta

Lucy Mosquera, Director of Data Science, Replica Analytics

Healthcare technology illustration

Uses of Synthetic Data by the Life Sciences Industry

Panellists Virginie Giroux, HEOR, Merck; Stephen Bamford, Janssen; Janice Branson, Novartis; moderator Reg Joseph, Health Cities.

Health care technology

How Synthetic Data will Transform Health Research and Innovation

Karen Cuenco, Integration & Quantitative Science, Senior Specialist Bill and Melinda Gates Foundation

2020 Summit - Getting Access to COVID-19 Data

Security and technology illustration

Responsibly Sharing COVID-19 Data

Khaled El Emam, University of Ottawa and Replica Analytics

Technology illustration

GIS and COVID-19

Alex Miller, President, Esri Canada

Healthcare and technology illustration

The Pandemic Menace and the Future of Humanity Through the Eyes of a Realist

Gerry Stegmaier, Reed Smith

Technology illustration

The COVID-19 Research Database

Jason LaBonte, Ph.D., Datavant

Measuring re-identification risk using a synthetic estimator to enable data sharing

One common way to share health data for secondary analysis while meeting increasingly strict privacy regulations is to de-identify it. To demonstrate that the risk of re-identification is acceptably low, re-identification risk metrics are used. There is a dearth of good risk estimators modeling the attack scenario where an adversary selects a record from the microdata sample and attempts to match it with individuals in the population.

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

JMIR Medical Informatics published this study which evaluates the ability of common utility metrics to rank synthetic data generation methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.

Can Synthetic Data Be A Proxy For Real Clinical Trial Data? A Validation Study

A detailed case study demonstrating the high utility of synthetic data generation for a colon cancer clinical trial dataset.

Evaluating the Utility of Synthetic COVID-19 Case Data

A detailed case study demonstrating the high utility and low privacy risks of synthetic data generation for the Ontario COVID-19 case dataset.

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data

Outlines a privacy model that measures attribute disclosure conditional on identity disclosure for synthetic data, combining two types of disclosure risk within a single framework.

Optimizing the Synthesis of Clinical Trial Data Using Sequential Trees

This paper describes the basic method for generating synthetic data for small source datasets, such as for clinical trials.

Explore the advantages of synthetic data in your healthcare organization. Find out more about Replica Synthesis today.