Authored: Dr. Khaled El Emam
In a new study, the number-one prediction by Gartner, the global research and advisory company, is that synthetic data will result in better privacy. These latest findings are consistent with what we’ve been experiencing as a data synthesis company. This is exciting news for organizations looking to apply data synthesis in their work and suggests the technology is indeed the future of data sharing.
According to the recent Gartner report, “by 2025, synthetic data will reduce personal customer data collection, avoiding 70% of privacy violation sanctions.” Importantly, the study also suggests synthetic data will reduce the risks of data breaches involving personal information.
Synthetic data is generated from machine learning models that capture the patterns and statistical properties of real data, but without any one-to-one mapping back to an individual. As we told CPO Magazine in August, with advances in AI and the ever-growing need for data the adoption of synthetic data generation is really picking up steam. Meanwhile, traditional methods to de-identification are facing headwinds, including re-identification attacks, regulator and public concerns, and increasingly challenging economics. Synthetic data has been getting attention as a practical privacy enhancing technology that can address these concerns and provide high utility data.
Gartner’s new study builds on predictions they made about synthetic data earlier this year, when they stated that:
- by 2024, 60% of the data used for the development of AI and analytics solutions will be synthetically generated;
- by 2024, use of synthetic data and transfer learning will halve the volume of real data needed for machine learning; and
- by 2025, 10% of governments will use a synthetic population with realistic behavior patterns to train AI while avoiding privacy and security concerns.
We’re beginning to see some positive indications from various data protection authorities toward the adoption of this kind of technology. With privacy laws and regulations evolving worldwide, efforts will need to consider how non-identifiable data, like synthetic data, will be regulated moving forward.
The International Association of Privacy Professionals recently published my top ten recommendations for regulating non-identifiable data, which I co-authored with lawyer Mike Hintze. You can read the more in-depth report here.
A reasonable approach to regulation that will support both privacy and innovation interests includes confirming widely-accepted thresholds for what is viewed as personal information and what isn’t; ensuring privacy and other risks are reduced to an acceptable level while preserving data utility; and limiting the regulation of statistical models that do not contain personal information, which have been shared for decades.
This, along with the fact that awareness and adoption of synthetic data generation are building and the body of evidence and use cases are accumulating, will help ensure Gartner’s predictions are fully realized.