As the pace of technological innovation continues to grow, new risks to consumer and patient privacy are appearing at a rate which could be difficult for regulators to keep up with.
Looking ahead to the next decade, this pace promises only to increase as technologies advance. To manage the privacy risks posed by new technologies, the development and advancement of privacy enhancing technologies (PETs) is key to counteract these risks and help us manage them.
A 2020 report[i] from the Future of Privacy Forum (FPF) looks at both upcoming technologies that present privacy risks as well as promising developments in PETs to keep your eye on in the next ten years. Among the PETs to look out for, synthetic data is listed as a promising technology to reduce the privacy risks associated with data sharing and processing for secondary purposes. In addition to synthetic data, another advance in AI and machine learning that the report flags is generative adversarial networks (GANs). GANs are AI systems that are used to create realistic simulations of real data, and can be employed in the generation of synthetic data sets, among other applications.
Growing Privacy Risks
The report outlines ten technologies that pose privacy risks as well as ten PETs to look for in the next decade. The technologies posing potential privacy risks are grouped into 3 categories:
- Innovations in tech linked to human bodies, health and social interaction
- Innovations in infrastructure
- Innovations in computing
Under the first category, Innovations in tech linked to human bodies, health and social interaction, they have identified four main risky technologies: biometric scanning, Real World Evidence (RWE), social credit and reputation scoring systems, and Internet of Bodies and brain-machine interfaces. Many of these technologies present challenges in terms of how to apply existing regulations and privacy frameworks. For example, RWE uses patient data generated by many different sources to evaluate medication and medical device safety and efficacy. Because some sources of data for RWE, such as mobile devices and fitness wearables, may be outside of the scope of existing health privacy laws (at least in the US), legal protections and privacy assurances are needed to ensure appropriate use and protection.
Innovations in tech based upon AI and machine learning (ML) algorithms can present challenges relating to fairness, algorithmic transparency and accountability. AI and ML systems can be very complex and opaque, making it difficult to understand the bases upon which outputs are produced. Also, biases can be an issue in AI and ML systems as historical data used to train the models can be biased or may lack representation of some groups.
Technologies linked to human bodies pose additional risks to individual health and safety. Hacking and misuse of these systems present very serious safety and privacy risks to users that need to be addressed in both the design of these systems and their regulation.
Within the second category, Innovations in infrastructure, three technologies are discussed: automation and (collaborative) robotics, location services and proximity tracking, and smart communities. Increasingly complex AI systems and robotics carry with them new issues such as how to deal with data that is co-created by humans and machines. Security can be a concern with such systems, as well as transparency and accountability as previously mentioned. As increasingly detailed location data becomes available from sources such as 5G networks, GPS systems and Bluetooth, concerns about how to apply existing data protection and privacy requirements also increase. Location data can reveal sensitive information about individuals based on the locations that they visit (e.g., medical clinics, entertainment venues, etc.) as well as travel and behavioral patterns that could pose a risk to both individual privacy and safety. Smart, or “wired”, communities involve the collection of large amounts of data through, for example, internet-of-things sensors. These technologies pose risks in that they could be used for tracking individuals using facial recognition, license plate tracking, and/or mobility data, to name some examples. Individual control over which data is collected and how it may be used will pose a challenge. As with AI and ML systems, transparency and accountability are key in ensuring individual privacy rights are respected.
In the last category, Innovations in computing, the authors include quantum computing, spatial computing (augmented/virtual reality), and distributed ledger technology (“blockchain”). Due to its superior speed and computation potential, quantum computing could pose a risk in that it may be used to crack forms of encryption that are currently being used for data protection and security. However, quantum computing may also make it possible to develop new, more robust forms of encryption in the future. Spatial computing, such as augmented and virtual reality, poses a risk in that these technologies collect a steady stream of user data. How data is stored and third party app controls could have a huge impact on the privacy of these systems. Distributed ledger technology (DLT), such as blockchain, has been proposed as a method to protect data; however, it also poses risks in that certain data subject rights may not be able to be met (e.g., the right to restrict data processing, rectification, and erasure). Privacy by design is key for all upcoming technologies, to ensure that privacy is built in at the design stage and risks are identified and mitigated throughout the development process.
PETs to look for in the 2020s
The report explores three categories of PETs that can help to mitigate current and emerging privacy risks: Advances in cryptography, localization of processing, and advances in AI and ML.
The category that we are most interested in is Advances in AI and Machine Learning, as the technologies included in this category all relate to data synthesis. The three technologies outlined in this category are “small data”, synthetic data, and GANs. Small data here refers to AI and ML systems that use only small amounts of real data or no real data at all. In order for these systems to be trained using small amounts of real data, other data types must be employed, such as synthetic data. Synthetic data replicates the patterns and distributions of real data sets while creating data that has no 1:1 mapping with actual individuals. Completely synthetic data sets can be generated for training, or synthetic data may also be used to augment existing data sets to increase the number of training examples. Alternatively, learning from preexisting models could be transferred to new models to reduce or eliminate the need for additional training. Reducing the amount of data needed to operate AI and ML systems means that the associated privacy risks are also reduced. Also, the use of alternative data sources (augmented and/or synthetic data) provides opportunities to re-balance data sets, allowing for reduction of biases and better representation of underrepresented groups.
GANs are AI systems which utilize two networks – a generator and a discriminator – to create new, realistic content based on real examples. GANs have been used extensively to generate images such as hand writing examples and facial images, and can also be used to generate synthetic data sets that accurately replicate the characteristics and distributions of real data.
Data synthesis can make more data available for training, research and analytics purposes while greatly reducing privacy risks. As a result, more organizations are turning to data synthesis for the sharing of data such as COVID 19 and clinical trials data sets, to name a few, in order to better mitigate privacy risks while also allowing data to be shared for critical research and analysis purposes.[ii]
The other categories of PETs discussed in the report, cryptography and localization of processing, also hold promise in offering data protection. The advances in cryptography outlined—zero knowledge proofs, homomorphic encryption, secure multi-party computation, and differential privacy—allow for the use of data without disclosing personal information to other parties. Homomorphic encryption, for example, allows computations to be performed on encrypted data without the data needing to be decrypted. However, these tools have their downsides as they can be computationally expensive and may not currently be scalable to work with larger data sets.
Tools that allow for the localization of processing include edge computing and local processing, device–level machine learning, and identity management technologies. These tools allow data to be processed on local devices rather than requiring data to be transferred to a service provider or other location (e.g. the cloud). Although these technologies have the promise of helping organizations meet data minimization and privacy by design/default obligations and may, in some cases, afford individuals greater control over their data, they are still emerging technologies and require more development in order to be deployed effectively in practice.
On the other hand, of all the PETs listed, data synthesis has reached the point at which it is being deployed efficiently and effectively in practice with commercially available tools such as those offered by Replica Analytics. Data synthesis tools facilitate the sharing of data by organizations in a responsible manner that protects privacy and upholds data subject rights.[iii]
This FPF report provides interesting insights into upcoming technologies that pose new risks to privacy as well as technologies that may be used to address new and existing privacy threats. The next ten years are likely to bring great technological innovation with the potential to improve our lives in many different ways, such as bettering human health, increasing our knowledge about the world around us, and societal advancement. However, such innovation entails the collection and use of data which could lead to an increase in privacy risks. The continued advancement of PETs, such as data synthesis and related AI and machine learning tools, is key to ensure that we can reap the benefits of new technologies without having to compromise on privacy.
For more discussion of how PETs are being developed and deployed, see the July 2020 Special issue of the Journal of Data Protection and Privacy co-edited by Editor-in-Chief, Ardi Kolah, and guest editor and Replica Analytics’ co-founder, Khaled El Emam. This issue provides a global perspective on existing and emerging PETs with contributions from leading privacy and technology practitioners in UK, France, Belgium, US, Canada, and Hong Kong.