Replica Analytics - An Aetion Company

Knowledgebase

Following the acquisition of Replica Analytics by Aetion, the generative AI technology previously known as Replica Synthesis is now Aetion® Generate and continues to create privacy-enhancing synthetic data.

Presentations, Articles, Reports & Media

This page is regularly updated with new videos, presentations, documents, articles, white papers and other information from Replica Analytics. Check back often to get the latest resource downloads.

Publication: Replica Analytics

The Cardinal Methodology for Evaluating Identity Disclosure Risk for Non-Public Data

Publication: O'Reilly

Practical Synthetic Data Generation

Interview: Replica Analytics

CityAge Q&A with Khaled El Emam

Article: IAPP Privacy Advisor

Synthetic Data is Key to Privacy by Design Practices in New Canadian Smart City Partnership

Article: Globe and Mail

Synthetic data puts privacy at the heart of AI projects

Technology Illustration

Article: IAPP Privacy Advisor

Precaution, Ethics & Risk: Perspectives on Regulating Non-identifiable Data

Security and technology illustration

Podcast: That Tech Pod

Deciphering Deep Fakes & Synthetic Data

Woman holding phone while her face pixelates

Publication: O'Reilly

Accelerating AI with Synthetic Data

Security and technology illustration

Podcast: Privacy Tech Talk

Privacy Tech Talk: May 18, 2022

Security and technology illustration

Article: IAPP Privacy Advisor

10 Recommendations for Regulating Non-identifiable Data

Security and technology illustration

Article: Fierce Healthcare

Real-World Evidence Firm Aetion Grabs Privacy-Protected Synthetic Data Provider Replica Analytics

Security and technology illustration

Article: CPO Magazine

Could Synthetic Data be the Future of Data Sharing?

Security and technology illustration

Article: IEEE Computer Society

Seven Ways to Evaluate the Utility of Synthetic Data

Health care technology

Article: Privacy Laws & Business

Enabling Health Information Sharing with Synthetic Data

Technology illustration

Article: Ottawa Business Journal

Replica Analytics Using Synthetic Data to Ease Medical Researchers’ Pain

Security and technology illustration

Report: Cutter, an Arthur D. Little Community

Synthetic Data Paradigm for Using and Sharing Data

Security and technology illustration

Podcast: OIPC of Ontario

Real or Fake? The Buzz About Synthetic Data

Security and technology illustration

Podcast: OIPC of Saskatchewan

Discussion with Dr. Khaled El Emam on synthetic data

Ten Things I have Learned About Synthetic Data Generation

Khaled El Emam, SVP & GM,
Replica Analytics

In this webinar, Khaled El Emam presents key learnings over the last year about this important technology, and how to deploy it. But also he covers what his predictions are for advances and the adoption of synthetic health data generation in the near future.

Demystifying HIPAA: Techniques for De-identifying Complex Multi-modal Data

Patricia Thaine, Co-Founder & CEO, Private AI

Lucy Mosquera, Senior Director of Data Science, Replica Analytics

Most health data is multi-model, with both structured and unstructured (text) components. Learn from Patricia Thaine, Co-Founder & CEO, Private AI, and Lucy Mosquera, Sr. Director of Data Science, Replica Analytics, about tech-enabled methods to de-identify complex datasets in accordance with HIPAA.

Reuse health data safely via tech-enabled re-identification risk management

Braley Crandall, VP Operations, Replica Analytics

Ade Adeoye, Senior Data Scientist, Replica Analytics

In this webinar, Braley Crandall and Ade Adeoye share how Replica Analytics’ methodologies and AI/ML-powered data synthesis technology have helped healthcare data providers mitigate risk and create opportunity in today’s real-world data marketplace.

A review of the new ISO Standard on Data De-identification: ISO/IEC 27559

Khaled El Emam, SVP & GM, Replica Analytics

In this webinar, Khaled El Emam gives an expert overview of the new ISO Standard on Data De-identification. An international standard that carries weight and reflects good practices, it is an important milestone, providing a practical and consistent process that can be applied globally to operationalize regulatory requirements for de-identification or anonymization.

Privacy Protective Sharing of Health Datasets using De-identification and Synthetic Data Generation

Lucy Mosquera, Senior Director of Data Science, Replica Analytics

In this DIA webinar, Lucy Mosquera describes an end-to-end solution for evaluating and mitigating privacy risks in health datasets, with plenty of relevant use cases. She discusses technologies such as re-identification estimation, risk-based de-identification, and synthetic data generation.

Changing the Conversation on Privacy – Using Data For Good

Khaled El Emam, SVP and GM, Replica Analytics

This webinar hosted by Innovate Cities gives an overview of their CityShield Data Trust. Khaled El Emam discusses Replica’s role as a project partner in generating privacy-protective synthetic data for CityShield. Other panelists include Dr. Ann Cavoukian, Innovate Cities’ CPO, and Hugh O’Reilly their ED, as well as Patricia Thaine, CEO of Private AI.

Generating and Evaluating Synthetic Clinical Trial Data in a Pharmaceutical Company

Mark Baillie & Lucy Mosquera

In this webinar, Mark Baillie from Novartis discusses a project to generate synthetic clinical trial datasets and use cases for synthetic clinical trial. Replica’s Lucy Mosquera shares how the datasets for the project were synthesized and results of privacy and utility assessments on the synthetic datasets.

Evaluating Privacy Risks in Synthetic Data Using Membership Disclosure

Lucy Mosquera & Xi Fang

Synthetic data generation can generate datasets with low privacy risk. Quantifying privacy risks in synthetic data is an essential step in building trust in this innovative technology. This webinar offers an introduction to the topic and new findings with practical guidance for membership disclosure risk assessments in synthetic data.

On the Validity of Statistical Analyses with Privacy-Preserving Synthetic Data

Khaled El Emam, SVP & GM, Replica Analytics; Lucy Mosquera, Director Data Science, Replica Analytics

In this ISPOR webinar we gave an overview of synthetic data and its privacy preserving properties, advantages over traditional de-identification, and then reviewed results from a simulation of the validity of inference on synthetic oncology datasets.

There is no charge to view but registering on the site is required.

Synthetic Data: The Future of Data Sharing?

Khaled El Emam, SVP & GM, Replica Analytics; Elizabeth Denham CBE, Baker McKenzie LLP.

Hosted by Baker McKenzie LLP, this webinar covered how synthetic data are generated, different use cases & opportunities, assessing privacy, utility & risks, and regulatory issues.

Exploring the concept of Identifiability in Canada’s proposed Consumer Privacy Protection Act (Bill C-27)

Khaled El Emam, SVP & GM, Replica Analytics; Adam Kardash, Partner, Osler/AccessPrivacy

Co-hosted by OslerLLP/ AccessPrivacy, this webinar focused on the concepts relating to identifiability as well as the definitions for de-identification and anonymization proposed in Bill C-27, and the implications for generating, using, and disclosing non-identifiable data.

Assessing Re-identification Risk Using Synthetic Data

Lucy Mosquera, Director of Data Science, Replica Analytics

In this webinar, Replica’s Director of Data Science, Lucy Mosquera, presents a novel approach to employ synthetic data for a more accurate assessment of re-identification risks in datasets, to better manage privacy risks and enable greater data sharing in healthcare and other sectors.

Synthetic Data as a Privacy Enhancing Technology

Lucy Mosquera, Director of Data Science, Replica Analytics

This was a presentation delivered at the State of Play of De-identification Techniques Masterclass, organized by the Future of Privacy Forum, at the 15th International Computers, Privacy & Data Protection Conference in Brussels. All Masterclass presentations here.

Generating Synthetic Longitudinal Data

Lucy Mosquera, Director of Data Science, Replica Analytics

This webinar hosted by C-Path provides an introduction to synthetic data generation – what it means, how it works and what technologies are used – as well as an overview of use cases where synthetic data can provide value in the context of RWD and clinical trial data.

Utility Assessments in Synthetic Data

Lucy Mosquera & Xi Fang

This webinar explores ways to assess utility in synthetic data and shares new research. We identify which utility assessments are most predictive of analytic value and show which can best rank synthesis methods for performance on a realistic biostatistics analysis. The study was funded by multiple Canadian government research funding agencies and the Bill and Melinda Gates Foundation.

Managing and Regulating Privacy Risks in Synthetic Data

Khaled El Emam, SVP & GM, Replica Analytics; Lucy Mosquera, Director Data Science, Replica Analytics

In this webinar, Khaled El Emam and Lucy Mosquera highlight the main findings of a year-long study done with the CHEO Research Institute and the University of Ottawa. Its focus is the privacy assurance use case for synthetic data and the underlying conclusions come from a technical analysis of the different types of privacy risks, a legal analysis of these risks, as well as perspectives shared by regulators.

Synthetic Data Generation 101

Khaled El Emam, SVP & GM, Replica Analytics

In this webinar, Dr. Khaled El Emam provides a general introduction to synthetic data generation (SDG), explaining what it means, how it works and what technologies are used, as well as an overview of the use cases where synthetic data can provide value in the context of real-world data and clinical trial data.

Generating Synthetic Longitudinal Data

Khaled El Emam, SVP & GM, Replica Analytics; Lucy Mosquera, Director Data Science, Replica Analytics

Longitudinal data differ from other types of data to which synthesis has been applied, such as tabular data, necessitating a novel set of approaches and technologies which we illustrate in this webinar.

Synthetic Data Generation for Rare Disease Research

Khaled El Emam & Jason Colquitt

This webinar, presented by Dr. Khaled El Emam and Jason Colquitt, discusses the small data challenges of rare disease research and how synthetic data generation can help address these challenges.

Measuring Re-identification Risk for Synthetic and Anonymized Data

Khaled El Emam, SVP & GM, Replica Analytics

Khaled El Emam discusses accurate methods for measuring re-identification risk. This new class of risk estimators helps avoid issues experienced with many of the current risk measurement methods.

Synthetic Data

Dean Eurich, Program Lead, Clinical Epidemiology, and Professor, School of Public Health

A presentation by Dean Eurich, Professor, School of Public Health, University of Alberta. Presented at the 2021 PHUSE/FDA Computational Science Symposium (CSS), Professor Eurich discusses how synthetic data can be used to accelerate access to data for health research.

Practical Mechanisms for Generating Anonymous Data

Khaled El Emam, SVP & GM, Replica Analytics

This BIH Digital Medicine online lecture is an overview of risk-based methods for creating non-identifiable data, highlights some limitations, discusses SDG methods and their applications in practice.

Ten Recommendations For Regulating De-identification

Khaled El Emam, SVP & GM, Replica Analytics

Based on his almost two decades of experience, Khaled El Emam presents ten key recommendations on how to define the criteria for when information becomes non-identifiable, and how to regulate the resulting data.

Implementing the FAIR Data Sharing Principles

Khaled El Emam, SVP & GM, Replica Analytics; David Sibbald, CEO and Co-Founder, Aridhia; Rodrigo Barnes, CTO, Aridhia

Presentation of Replica Analytics’ joint offering with Aridhia: An integrated and managed FAIR data sharing platform.

Data Synthesis: A Tool for Responsible Data Sharing

Khaled El Emam, SVP & GM, Replica Analytics

This presentation by Khaled El Emam provides a general overview of synthetic data generation and its applications.

Synthetic Clinical Trial Data: Use Cases, Methods, and Experiences

Stephen Bamford, Head of Clinical Data Standards & Transparency; IDAR, Global Development, Janssen

Sharing experiences with the synthesis of clinical trial data and how that data has been used by a pharmaceutical company.

Optimal Synthesis of Clinical Trial Data

Khaled El Emam, SVP & GM, Replica Analytics

Presentation describing a method for the synthesis of complex clinical trial data and explaining how to tune its hyperparameters, with results presented on multiple oncology clinical trial datasets.

Experiences Implementing Data Synthesis in a Global Life Sciences Company

Stephen Bamford, Head of Clinical Data Standards & Transparency; IDAR, Global Development, Janssen

A webinar from Stephen Bamford at Janssen describing their experiences with the implementation of synthetic data generation and where it fits in among other PETs.

What We Learned About Data Synthesis in 2020

Khaled El Emam, SVP & GM, Replica Analytics

In this webinar, Khaled El Emam provided his perspective on the changes in 2020 in data synthesis technology, the market, the adoption, the practical application, the governance, and the potential of this technology.

Empirical Assessment of Privacy Risks in Data

Khaled El Emam, SVP & GM, Replica Analytics; Janice Branson, Global Head of Advanced Methodology & Data Science, Novartis; Nathan Good, Principal, Good Research

Presentation on our experiences with trying to identify individuals in datasets that have been de-personalized, especially health data. This is an overview of Motivated Intruder Attacks.

Ten Things I Have Learned About Health Data Monetization

Khaled El Emam, SVP & GM, Replica Analytics

Sharing some key experiences working with organizations to monetize their data for commercial, academic, and public interest purposes. A practical journey with many lessons learned.

Ten Things I Have Learned About De-identification

Khaled El Emam, SVP & GM, Replica Analytics

An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.

An Introduction to Synthetic Clinical Trial Data

Richard Hoptroff, Founder and CTO, Hoptroff; Lucy Mosquera, Director Data Science, Replica Analytics; Rebecca Li, Executive Director, Vivli; Ben Szekely, Director of Product, Cambridge Semantics

An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.

Synthetic Data Summit 2023

Synthetic Data Summit 2023

Khaled El EmamGM and SVP, Replica Analytics; Professor, University of Ottawa

2022 Symposium: Perspectives on Regulating Privacy Enhancing Technologies

Security and technology illustration

A Practical Introduction to Synthetic Data Generation

Khaled El Emam, University of Ottawa and Replica Analytics

De-identification Standardization Efforts by CANON

Adam Kardash, Osler/Access Privacy & CANON

Regulating
Non-identifiable Data
and Synthetic Data

Vance Lockton, Office of the Information and Privacy Commissioner of Ontario

2021 Summit - Synthetic Data: The Future of Data Sharing

Security and technology illustration

The Alberta Synthetic Data Project

Dean Eurich, Professor, School of Public Health, University of Alberta

Lucy Mosquera, Director of Data Science, Replica Analytics

Healthcare technology illustration

Uses of Synthetic Data by the Life Sciences Industry

Panellists Virginie Giroux, HEOR, Merck; Stephen Bamford, Janssen; Janice Branson, Novartis; moderator Reg Joseph, Health Cities.

Health care technology

How Synthetic Data will Transform Health Research and Innovation

Karen Cuenco, Integration & Quantitative Science, Senior Specialist Bill and Melinda Gates Foundation

2020 Summit - Getting Access to COVID-19 Data

Security and technology illustration

Responsibly Sharing COVID-19 Data

Khaled El Emam, University of Ottawa and Replica Analytics

Technology illustration

GIS and COVID-19

Alex Miller, President, Esri Canada

Healthcare and technology illustration

The Pandemic Menace and the Future of Humanity Through the Eyes of a Realist

Gerry Stegmaier, Reed Smith

Technology illustration

The COVID-19 Research Database

Jason LaBonte, Ph.D., Datavant

Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets

This paper published in the Journal of Clinical Oncology: Clinical Cancer Informatics describes a study evaluating synthetic data generation on diverse breast cancer clinical trial datasets. We present a quantitative methodology for evaluating the replicability of analyses using synthetic data. We evaluate two common/defensible privacy metrics: attribution and membership disclosure. We compare performance of three types of generative models. The results from replicating 8 clinical trial analyses show generative models can produce high utility and high privacy datasets. The study was performed with colleagues at the Ottawa Hospital and collaborators across Canada/US.

Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Modeling Studies: Development and Validation.

The reporting of machine learning (ML) prognostic and diagnostic modeling studies is often inadequate, making it difficult to understand and replicate such studies. JMIR Medical Informatics published this study which aims to consolidate the ML reporting guidelines and checklists in the literature to provide reporting items for prognostic and diagnostic ML in in-silico and shadow mode studies.

A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health

Our recent article in Scientific Reports entitled, A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health, provides evidence that synthetic data generation can be a good solution for international collaborations by helping to solve the data sharing challenges. This type of data exchange has been particularly difficult recently with some European countries (much has been written about this) – the published methodology can be applied in that context in particular.

A method for generating synthetic longitudinal health data

Getting access to administrative health data for research is difficult and time-consuming due to privacy regulations. This paper assessed the feasibility of generating and sharing synthetic administrative health data using a recurrent deep learning model. The privacy assessment concluded that attribution disclosure risk was substantially less than the typical acceptable risk threshold. Results also show that the synthetic dataset was suitably similar to the real data.

Synthetic data as an enabler for machine learning applications in medicine

There is a growing interest in the application of synthetic data across health and life sciences, but to fully realize the benefits, further education, research, and policy innovation is required. This article summarizes the opportunities and challenges of synthetic data generation for health data, and provides directions for how this technology can be leveraged to accelerate data access for secondary purposes.

Validating A Membership Disclosure Metric For Synthetic Health Data

One of the increasingly accepted methods to evaluate the privacy of synthetic data is by measuring the risk of membership disclosure. This is a measure of the F1 accuracy that an adversary would correctly ascertain that a target individual from the same population as the real data is in the dataset used to train the generative model, and is commonly estimated using a data partitioning methodology with a 0.5 partitioning parameter.

Measuring re-identification risk using a synthetic estimator to enable data sharing

One common way to share health data for secondary analysis while meeting increasingly strict privacy regulations is to de-identify it. To demonstrate that the risk of re-identification is acceptably low, re-identification risk metrics are used. There is a dearth of good risk estimators modeling the attack scenario where an adversary selects a record from the microdata sample and attempts to match it with individuals in the population.

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

JMIR Medical Informatics published this study which evaluates the ability of common utility metrics to rank synthetic data generation methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.

Can Synthetic Data Be A Proxy For Real Clinical Trial Data? A Validation Study

A detailed case study demonstrating the high utility of synthetic data generation for a colon cancer clinical trial dataset.

Evaluating the Utility of Synthetic COVID-19 Case Data

A detailed case study demonstrating the high utility and low privacy risks of synthetic data generation for the Ontario COVID-19 case dataset.

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data

Outlines a privacy model that measures attribute disclosure conditional on identity disclosure for synthetic data, combining two types of disclosure risk within a single framework.

Optimizing the Synthesis of Clinical Trial Data Using Sequential Trees

This paper describes the basic method for generating synthetic data for small source datasets, such as for clinical trials.

Explore the advantages of synthetic data in your healthcare organization. Find out more about Replica Synthesis today.