Benefits and Challenges of Machine Learning in Drug Development
The adoption of Artificial Intelligence (AI) technologies in healthcare could lead to great improvements in efficiency, patient care and medical research, accelerating the discoveries that lead to new cures. Some such applications are already in use today to help doctors find optimal treatment plans based on vast amounts of information that only a computer could process , .
For example, learning health systems have been implemented in many leading-edge heath care institutions, like Kaiser Permanente ,  and Duke University Medical Centre , , that are able to analyze patient data contained in an electronic medical record system to provide a prognosis and inform patients’ treatment plans based on what was effective/ineffective for similar patients.
AI is also being applied in public health to provide information on public health crises that allows authorities to intervene in a faster and more effective manner. For example, AI and machine learning were used to develop a dashboard that provides near real-time data on opioid usage trends to authorities in Indiana , .
Recent Report from GAO and NAM
A recent report from the U.S. Government Accountability Office (GAO) and National Academy of Medicine (NAM) entitled “Artificial Intelligence in Healthcare” explores this topic, looking at the impact of, and barriers to, the adoption of Artificial Intelligence (AI) technologies in healthcare .
The first part of the report is a NAM survey of current knowledge relating to artificial intelligence in healthcare. This portion has been published independently as Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril.
Part two of the report is an AI and machine learning (ML) technology assessment from GAO. It is the first in a planned series of technology assessments on the use of AI technologies in health care that Congress has requested from GAO.
We will not present here a complete overview of the report. Rather, we will discuss the few items we have identified as most relevant to us, our customers and other stakeholders.
Identified Barriers to AI/ML Adoption
The report explores some of the challenges that are currently impeding the adoption of AI/ML in health care in general and in drug development in particular. Data is one of the key components determining the success of an AI/ML implementation in health care. “The generation, collection, access to, and use of data are important aspects of both health care and machine learning research and applications.”  Both NAM and GAO pinpoint a paucity of quality data as a barrier to AI/ML adoption. NAM highlights the impact of poor-quality data in the first part of the report. “Bad data quality adversely impacts patient care and outcomes.”  They suggest that data quality could be improved through “the use of multicenter datasets, incorporation of time varying data, assessment of missing data as well as informative censoring, and development of metrics of clinical utility”.
High quality data to ensure positive outcomes.
GAO also points to the need for high quality data to ensure positive outcomes. The training of machine learning algorithms requires a large amount of data. Also, for an algorithm to perform well, the data used for training must be of high quality, “accurate and representative”. High quality data may be difficult to obtain as data may not be in appropriate formats for machine learning models, and there is a lack of standardization across organizations. To avoid bias, it is also important to obtain data that is representative of the population. This may require that data be obtained from many different sources to ensure all groups are properly represented. “Curating these data is a resource-intensive process, according to stakeholders and the literature. One representative from a drug company told [GAO] that 80 percent of their effort goes into accessing and curating data to make it usable for their machine learning applications.”  Such an onerous process can certainly act as a barrier to AI adoption and progress in health care.
Related to a shortage of high-quality data is the issue of barriers to gaining access to data sources. As previously stated, ML algorithms require a large amount of representative data for training and testing, and barriers to accessing and sharing data, such as cost and legal issues, inhibit the development and application of ML in healthcare. In terms of costs, the high cost of some types of data can be an obstacle. “According to one industry representative, collecting data from the early drug discovery phase can be cost prohibitive. This representative said that certain health-related data may cost tens of thousands of dollars, as compared to just cents for other consumer related data that many technology companies use.” 
There are also legal barriers to gaining access to data. HIPAA, which regulates the use and disclosure of protected health information in the U.S., prohibits the sharing and use of identifiable health data outside of direct patient care except in very narrow circumstances, under specific conditions, such as with the consent of data subjects or when the information has been rendered non-personal through traditional de-identification techniques or data synthesis. And other state-specific laws, as well as international regulations such as the General Data Protection Regulation in Europe, can be even more restrictive in their regulation of how data can be used and shared.
Privacy laws often hold patient consent up as the gold standard in authorizing data sharing, but obtaining consent is not always practical and could lead to bias. “[E]xperts told us that rules and processes for patient consent to data sharing are complicated and may make it difficult for individuals to give such consent. In addition, one expert cautioned that consent can cause selection biases and could also limit the amount of usable data” . Other legal pathways are available but may be challenging to navigate, particularly for organizations that are not motivated to share data. “One expert… said that the privacy laws and regulations may not be the issue but rather their interpretation by organizations that may be hesitant to share data.”  There may also be a lack of incentive, or actually a disincentive, for pharmaceutical companies to share their data. “According to a drug company representative and an academic researcher, drug companies consider their data to be valuable, proprietary, and a competitive edge.”  As such, some companies are adverse to the idea of sharing data with outside entities as they view this as an economic disadvantage.
Reducing Barriers to Adoption
GAO presents two main policy suggestions to manage these data related issues: Establishing standards around data and the algorithms that use the data, and creating mechanisms to share high quality data while protecting patient privacy.
Establishing data and algorithmic standards refers to not only standard formats to allow for interoperability, but also standards around the representativeness of data and minimizing bias. Standards for algorithms would aim to increase transparency and explainability of how outputs are produced to assure patients and the public that algorithms are working in a fair and ethical manner.
Mechanisms to share high quality data suggested by GAO include the establishment of research consortia and data repositories for data from drug trials. There are privacy enhancing technologies (PETs) available to facilitate the responsible sharing of data such as anonymization, the generation of synthetic data, secure computation, and so on, which minimize the risk to patient privacy.
For a more detailed discussion of these technologies, see our January webinar Ten things I have learned about de-identification.
In the current regulatory climate, these PETs are gaining more traction. Wider use of PETs in health care can help to facilitate the sharing of data, but not all PETs produce high quality data. Some techniques in use today, such as HIPAA Safe Harbor, greatly reduce the utility of data while not necessarily ensuring that patient privacy is being protected. PETs should aim to optimize both privacy protection and data utility to ensure that the quality of the data produced can remain high. For example, some risk-based anonymization techniques can produce data with high utility and a very low risk of re-identification. Synthetic data generation, in which data is modeled on the features of a real patient data set, is another effective option for producing high quality data that poses low risk to patient privacy and has significant economic advantages compared to other PETs.
Reducing the barriers to the adoption of AI and ML in healthcare, such as data access, can open up a whole new world of possibilities, leading to more efficient, responsive and effective healthcare systems. In so far as barriers relate to considerations around patient privacy, PETs can be used to minimize many privacy risks to patients and allow much needed access to and sharing of data in a responsible and compliant manner.
- T. Persons et al., “Artificial Intelligence in Health Care: Benefits and Challenges of Machine Learning in Drug Development,” U.S. Government Accountability Office and National Academy of Medicine, United States, Dec. 2019.
- M. G. Zauderer et al., “Piloting IBM Watson Oncology within Memorial Sloan Kettering’s regional network.,” JCO, vol. 32, no. 15_suppl, pp. e17653–e17653, May 2014, doi: 10.1200/jco.2014.32.15_suppl.e17653.
- Shilling, Dearing, Staley, and Fahey, “How Kaiser Permanente Became a Continuous Learning Organization.” Kaiser Permanente, 2011.
- McGlynn et al., “Developing a data infrastructure for a learning health system: the PORTAL network,” JAMIA, vol. 21, pp. 596–601, 2014.
- R. A. Rosati, J. F. McNeer, C. F. Starmer, B. S. Mittler, J. J. Morris, and A. G. Wallace, “A new information system for medical practice,” Archives of Internal Medicine, vol. 135, no. 8, pp. 1017–1024, 1975.
- L. Key, “School of Medicine Establishes a ‘Learning Health System’ to Leverage Data Science Tools to Improve Research and Patient Care,” Duke School of Medicine, 02-Apr-2019. [Online]. Available: https://medschool.duke.edu/blog/school-medicine-establishes-learning-health-system-leverage-data-science-tools-improve. [Accessed: 18-Feb-2020].
- B. Bostic, “Using artificial intelligence to solve public health problems,” Beckers Hospital Review, 16-Feb-2018. [Online]. Available: https://www.beckershospitalreview.com/healthcare-information-technology/using-artificial-intelligence-to-solve-public-health-problems.html. [Accessed: 17-Feb-2020].