
This page is regularly updated with new videos, presentations, documents, articles, white papers and other information from Replica Analytics. Check back often to get the latest resource downloads.
Khaled El Emam, SVP & GM, Replica Analytics
In this webinar, Khaled El Emam gives an expert overview of the new ISO Standard on Data De-identification. An international standard that carries weight and reflects good practices, it is an important milestone, providing a practical and consistent process that can be applied globally to operationalize regulatory requirements for de-identification or anonymization.
Lucy Mosquera, Senior Director of Data Science, Replica Analytics
In this DIA webinar, Lucy Mosquera describes an end-to-end solution for evaluating and mitigating privacy risks in health datasets, with plenty of relevant use cases. She discusses technologies such as re-identification estimation, risk-based de-identification, and synthetic data generation.
Khaled El Emam, SVP and GM, Replica Analytics
This webinar hosted by Innovate Cities gives an overview of their CityShield Data Trust. Khaled El Emam discusses Replica’s role as a project partner in generating privacy-protective synthetic data for CityShield. Other panelists include Dr. Ann Cavoukian, Innovate Cities’ CPO, and Hugh O’Reilly their ED, as well as Patricia Thaine, CEO of Private AI.
Mark Baillie & Lucy Mosquera
In this webinar, Mark Baillie from Novartis discusses a project to generate synthetic clinical trial datasets and use cases for synthetic clinical trial. Replica’s Lucy Mosquera shares how the datasets for the project were synthesized and results of privacy and utility assessments on the synthetic datasets.
Lucy Mosquera & Xi Fang
Synthetic data generation can generate datasets with low privacy risk. Quantifying privacy risks in synthetic data is an essential step in building trust in this innovative technology. This webinar offers an introduction to the topic and new findings with practical guidance for membership disclosure risk assessments in synthetic data.
Khaled El Emam, SVP & GM, Replica Analytics; Lucy Mosquera, Director Data Science, Replica Analytics
In this ISPOR webinar we gave an overview of synthetic data and its privacy preserving properties, advantages over traditional de-identification, and then reviewed results from a simulation of the validity of inference on synthetic oncology datasets.
There is no charge to view but registering on the site is required.Khaled El Emam, SVP & GM, Replica Analytics; Elizabeth Denham CBE, Baker McKenzie LLP.
Hosted by Baker McKenzie LLP, this webinar covered how synthetic data are generated, different use cases & opportunities, assessing privacy, utility & risks, and regulatory issues.
Khaled El Emam, SVP & GM, Replica Analytics; Adam Kardash, Partner, Osler/AccessPrivacy
Co-hosted by OslerLLP/ AccessPrivacy, this webinar focused on the concepts relating to identifiability as well as the definitions for de-identification and anonymization proposed in Bill C-27, and the implications for generating, using, and disclosing non-identifiable data.
Lucy Mosquera, Director of Data Science, Replica Analytics
In this webinar, Replica’s Director of Data Science, Lucy Mosquera, presents a novel approach to employ synthetic data for a more accurate assessment of re-identification risks in datasets, to better manage privacy risks and enable greater data sharing in healthcare and other sectors.
Lucy Mosquera, Director of Data Science, Replica Analytics
This was a presentation delivered at the State of Play of De-identification Techniques Masterclass, organized by the Future of Privacy Forum, at the 15th International Computers, Privacy & Data Protection Conference in Brussels. All Masterclass presentations here.
Lucy Mosquera, Director of Data Science, Replica Analytics
This webinar hosted by C-Path provides an introduction to synthetic data generation – what it means, how it works and what technologies are used – as well as an overview of use cases where synthetic data can provide value in the context of RWD and clinical trial data.
Lucy Mosquera & Xi Fang
This webinar explores ways to assess utility in synthetic data and shares new research. We identify which utility assessments are most predictive of analytic value and show which can best rank synthesis methods for performance on a realistic biostatistics analysis. The study was funded by multiple Canadian government research funding agencies and the Bill and Melinda Gates Foundation.
Khaled El Emam, SVP & GM, Replica Analytics; Lucy Mosquera, Director Data Science, Replica Analytics
In this webinar, Khaled El Emam and Lucy Mosquera highlight the main findings of a year-long study done with the CHEO Research Institute and the University of Ottawa. Its focus is the privacy assurance use case for synthetic data and the underlying conclusions come from a technical analysis of the different types of privacy risks, a legal analysis of these risks, as well as perspectives shared by regulators.
Khaled El Emam, SVP & GM, Replica Analytics
In this webinar, Dr. Khaled El Emam provides a general introduction to synthetic data generation (SDG), explaining what it means, how it works and what technologies are used, as well as an overview of the use cases where synthetic data can provide value in the context of real-world data and clinical trial data.
Khaled El Emam & Jason Colquitt
This webinar, presented by Dr. Khaled El Emam and Jason Colquitt, discusses the small data challenges of rare disease research and how synthetic data generation can help address these challenges.
Khaled El Emam, SVP & GM, Replica Analytics
Khaled El Emam discusses accurate methods for measuring re-identification risk. This new class of risk estimators helps avoid issues experienced with many of the current risk measurement methods.
Khaled El Emam, SVP & GM, Replica Analytics; Lucy Mosquera, Director Data Science, Replica Analytics
Longitudinal data differ from other types of data to which synthesis has been applied, such as tabular data, necessitating a novel set of approaches and technologies which we illustrate in this webinar.
Dean Eurich, Program Lead, Clinical Epidemiology, and Professor, School of Public Health
A presentation by Dean Eurich, Professor, School of Public Health, University of Alberta. Presented at the 2021 PHUSE/FDA Computational Science Symposium (CSS), Professor Eurich discusses how synthetic data can be used to accelerate access to data for health research.
Khaled El Emam, SVP & GM, Replica Analytics
This BIH Digital Medicine online lecture is an overview of risk-based methods for creating non-identifiable data, highlights some limitations, discusses SDG methods and their applications in practice.
Khaled El Emam, SVP & GM, Replica Analytics
Based on his almost two decades of experience, Khaled El Emam presents ten key recommendations on how to define the criteria for when information becomes non-identifiable, and how to regulate the resulting data.
Khaled El Emam, SVP & GM, Replica Analytics; David Sibbald, CEO and Co-Founder, Aridhia; Rodrigo Barnes, CTO, Aridhia
Presentation of Replica Analytics’ joint offering with Aridhia: An integrated and managed FAIR data sharing platform.
Khaled El Emam, SVP & GM, Replica Analytics
This presentation by Khaled El Emam provides a general overview of synthetic data generation and its applications.
Stephen Bamford, Head of Clinical Data Standards & Transparency; IDAR, Global Development, Janssen
A webinar from Stephen Bamford at Janssen describing their experiences with the implementation of synthetic data generation and where it fits in among other PETs.
Khaled El Emam, SVP & GM, Replica Analytics
In this webinar, Khaled El Emam provided his perspective on the changes in 2020 in data synthesis technology, the market, the adoption, the practical application, the governance, and the potential of this technology.
Khaled El Emam, SVP & GM, Replica Analytics
Sharing some key experiences working with organizations to monetize their data for commercial, academic, and public interest purposes. A practical journey with many lessons learned.
Khaled El Emam, SVP & GM, Replica Analytics
An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.
Stephen Bamford, Head of Clinical Data Standards & Transparency; IDAR, Global Development, Janssen
Sharing experiences with the synthesis of clinical trial data and how that data has been used by a pharmaceutical company.
Khaled El Emam, SVP & GM, Replica Analytics; Janice Branson, Global Head of Advanced Methodology & Data Science, Novartis; Nathan Good, Principal, Good Research
Presentation on our experiences with trying to identify individuals in datasets that have been de-personalized, especially health data. This is an overview of Motivated Intruder Attacks.
Khaled El Emam, SVP & GM, Replica Analytics
Presentation describing a method for the synthesis of complex clinical trial data and explaining how to tune its hyperparameters, with results presented on multiple oncology clinical trial datasets.
Richard Hoptroff, Founder and CTO, Hoptroff; Lucy Mosquera, Director Data Science, Replica Analytics; Rebecca Li, Executive Director, Vivli; Ben Szekely, Director of Product, Cambridge Semantics
An overview of lessons learned and key observations from more than a decade of developing de-identification and other privacy enhancing technologies and applying them in practice globally.
Khaled El Emam, University of Ottawa and Replica Analytics
Adam Kardash, Osler/Access Privacy & CANON
Vance Lockton, Office of the Information and Privacy Commissioner of Ontario
Dean Eurich, Professor, School of Public Health, University of Alberta
Lucy Mosquera, Director of Data Science, Replica Analytics
Panellists Virginie Giroux, HEOR, Merck; Stephen Bamford, Janssen; Janice Branson, Novartis; moderator Reg Joseph, Health Cities.
Karen Cuenco, Integration & Quantitative Science, Senior Specialist Bill and Melinda Gates Foundation
Khaled El Emam, University of Ottawa and Replica Analytics
Gerry Stegmaier, Reed Smith
There is a growing interest in the application of synthetic data across health and life sciences, but to fully realize the benefits, further education, research, and policy innovation is required. This article summarizes the opportunities and challenges of synthetic data generation for health data, and provides directions for how this technology can be leveraged to accelerate data access for secondary purposes.
One of the increasingly accepted methods to evaluate the privacy of synthetic data is by measuring the risk of membership disclosure. This is a measure of the F1 accuracy that an adversary would correctly ascertain that a target individual from the same population as the real data is in the dataset used to train the generative model, and is commonly estimated using a data partitioning methodology with a 0.5 partitioning parameter.