Thursday, December 5, 2019

Risks of AI pose complex challenges to deployment in healthcare

Professor Nicholson Price, University of 
Michigan. Credit Petrie-Flom Center - 
Harvard University
Nicholson Price is a professor of law teaching and writing in the areas of intellectual property, health law, and regulation, particularly focusing on the law surrounding innovation in the life sciences. He previously was an assistant professor of law at the University of New Hampshire School of Law, an academic fellow at the Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School, and a visiting scholar at the University of California, Hastings College of the Law. Previously, he clerked for the Hon. Carlos T. Bea of the U.S. Court of Appeals for the Ninth Circuit. He received a JD and a PhD in biological sciences from Columbia University and an AB in biological sciences from Harvard College.​  PARCA eNews spoke to Mr. Price following an article he wrote for Brookings about the risks and remedies for AI in healthcare to get his perspective on those risks with regard to radiology and PACS administrators.

Q. In an article you wrote for Brookings, you raised a number of risks and challenges around AI in healthcare that haven’t been discussed very much.  One of the most important risks to be considered are the risks to patients. In your view what are some of the most significant AI-related mistakes in image reading that you might expect to see in a courtroom in the future?

A. Yes, so the biggest set of issues is AI that is trained on one set up circumstances that ends up not providing accurate predictions or reads in another set of circumstances.

In particular, what can go on, if you're not careful, is it's pretty easy for the AI to learn based on traits that aren't what you actually want to measure. The most classic example of this is of an AI that was looking at x-rays, I think it was pneumonia, and that
prediction was based on not whether there was the disease in the image, but instead based on whether the image was taken with a portable chest x-ray or a fixed x-ray machine. Obviously portable machines are used with sicker patients and that use was a significant part of its prediction.  The challenge with that is, once you change your practices, all of a sudden the AI isn't going to be performing as well.

Similarly, if you train AI in one context where you have a hospital with one particular set of patients the AI may learn particular characteristics that are focused on that set of patients, and once you shift over to a different care environment with different patients then the things that the AI has learned to spot or the algorithm has learned to spot are no longer as predictive, so you can get problematic reads and inaccuracy.

The challenge there, and one response to it is, well we'll always have a radiologist who is looking at this again, and maybe they'll just catch these errors and so we don't worry about it so much,

Q. But isn’t part of the point of AI is to focus the radiologist on certain anomalies in the imaging, so isn't there a chance then that they might not see other things that they might otherwise have had to see if they were looking at it in a normal circumstance?

Yes that's essentially the concern right now, that AI will provide a quick first screen, which is great, but the reality is that people will tend to actually trust the machine, which is good, that’s what we want, we want to increase performance and decrease the amount of time spent to focus radiologist attention on things that matter. On the other hand, particularly in the early phases of the technology, the radiologists learn to really only focus on the part of an image where the AI has flagged something for interest and that flagging ends up being problematic or inaccurate. Then we run into the risk that no one will actually catch additional errors and won't be focused on places that the AI hasn’t already suggested they focus.

Q. You also touched on is this notion that an AI system that is trained on a certain data set or number of data sets could lead to different clinical decisions than say another AI system that's trained on different data sets.

Yes this is an area that I think is a little bit under discussed. Which is, and I think this applies slightly less in medical imaging, but not just that AI diagnoses may be different based on the clinical systems or populations on which it is trained, but when we have AI making treatment recommendations, or suggesting what the right course of action is, that those courses of actions may themselves differ between different contexts.

So if you have a place that is a particularly fancy, high resource hospital with tons of support staff and lots of doctors on call, then a riskier intervention that has a higher likelihood of absolute success, but also a higher likelihood of problematic side effects may be a tolerable recommendation. On the other hand, if you're in a lower resource context where you have much less support much less ability to intervene with problematics adverse events, that might not be the right course of action.

To the extent that AI learns from or is trained in one context and may end up recommending care that's not so great or doesn't match so well in a context where it's eventually deployed is an issue that needs to be addressed.  One thing that strikes me as particularly interesting in the radiology context is a tool, I'm not sure I can remember the name right now, I'm not sure if you're aware of it. This is a tool that was released to let radiologists who aren’t trained as AI or machine learning experts to basically train their own machine learning tools in their own health systems on their own data to allow them to do in-house AI training.

Q. That was going to be one of my questions. If there are hospital systems that are developing their own AI, and then there are vendors who were developing AI as part as whatever modality or systems they're selling, how will those systems be synchronized?

A. That is super fascinating. There are two different versions of that. One of them is you know, the fancy high-resource hospitals that have their own machine learning folks who are going to be developing their own AI. As an example (University of) Michigan has some of its own predictive algorithms, and the idea here is that, you know, they're really focused on their own particular environment.

The algorithms are developed on these data, on these patients, and hopefully those are the kinds of patients that they will keep seeing, and the hope is that the AI will be particularly appropriate, as opposed to off-the-shelf vendors where theoretically you know, they are developing it on some data we’re not always sure exactly what data, sometimes they're retraining it in the context of an individual health systems data, but sometimes they're just kind of taking it off the shelf.

The hope is that it will be well validated, but given the importance of context it is
tough to know how well off-the-shelf vendor provided systems are going to work in any particular context. Now this is just one side of things, which is to say context matters and we’d like it to be focused on what our hospital is like, what our EHR is like and what our practices are like and all those sorts of things.

The flip side of this is validation and making sure that it's actually working well, and the AI actually performs to acceptable quality standards, is that this is harder to do with smaller datasets. It's harder for smaller entities to engage in and so, there's this trade-off between saying, I want to be really contextual but to the extent that you have every place developing their own system part of me says “yay we've dealt with
context issue.” Another part of me says this is terrifying because the validation and the quality control issues just got way more complicated and it's also going to shut smaller players out of the system. There is not an answer to this it is just another complicated set of issues.

Q. Do you think that there's going to come a time when there's going to be some kind of standards committees or standardization of some kind that determines what data needs to go into this particular AI system, or modality, x-rays or MRIs or whatever medical device were talking about?

A. I think so and I hope so. I know there are a couple of efforts under way to develop what some are calling a “nutrition label” for AI, what you need to know about a particular AI system before you deploy it and I think it's going to be a task for some sort of committee to say, “look, here's what we really need to know, here's what you need to tell us about the quality standards that you need to meet to assure us that you've developed this in a reasonable way and then we can be reasonably sure that it's going to work in our context and reasonably sure that these are the tests that we need to do to make sure that it will. I see this as a potential role for learned societies like the colleges or professional associations.

I'm not sure how much we’re going to see FDA doing this. I think it will probably be involved, but to the extent we do these kind of in-house artisanal systems, a lot of those are not going to go through FDA. The vendor supplied systems will, but not all of them will. So this is a place where the professions are going to be pretty involved in figuring out what the right standards are and frontline professionals are probably going to have to be involved in knowing how to evaluate the AI tools they are thinking about developing and implementing and how to make sure they're really working at a high quality level.

Q. That was another question, I had, how does the doctor know what system to trust and what systems they may need to look at a little more skeptically?

A. It is tough, it is very tough for the doctor to know what system to trust. Realistically given the way workflow and medical practice works, a lot of this task is going to have to be offloaded onto the folks and health systems in charge of implementing and deploying AI systems today.

They will need to say, here's a set of information we can tell you about how confident we are about this prediction, about how sure we are that this has been developed in the right way. Some of this is just going to be straight up gating, you know we’re only going to deploy things that have gone through this set of quality metrics but some of that is going be, “Hey, you know front-line physicians, this is outlined in green because we think this is a really solid tool and it works really well in this situation versus, here's a prediction that's outlined in orange and is something you might want to apply but we’re not so sure about it.”

The problem is physicians aren’t necessarily good at dealing with this sort of uncertainty, especially in the hustle and bustle of a rapid-paced practice, but I think that's one of the places we’re going to need to see substantial effort.

Q. Before we ever get to having a quality status for systems. I'm imagine there’s going to be some legal challenges. What are some of the legal risks posed as we are developing these systems that you see coming down the road?

A. The biggest set of legal issues or a big set of legal issues is liability when things go wrong, when patients get injured. How is that going to shake out and I think we just don't know the answer yet. I've got a piece out in JAMA from a couple of weeks ago that talks about the tort liabilities for physicians using AI systems, but physicians aren't the only people that are going to potentially face liability, you know hospitals that deploy systems, the developers of AI systems, if they are sold to vendors and then distributed to hospitals, all could potentially face liability when things go wrong. The law is not there yet, we haven’t had these cases arise. We can make some guesses about that is going to be like, but it is unclear exactly what courts are going to do.

An upstream risk that I think is a little bit different is privacy risks and these just show up because this is an area where people need tons of data to do interesting stuff and anytime you have issues of big data and data sharing you run into privacy concerns, potential HIPAA violations and things like that. This is not wildly different from other big data concerns, but they are there because developers are trying to do a lot and trying to do it pretty quickly and cutting privacy corners can be a problem.

And then there are the regulatory issues where those systems are going to end up actually being approved as medical devices. There are going to be questions of exactly what the FDA is going to want for that approval and FDA is still working out its approach to improving software as medical devices generally and AI powered medical devices in particular.

Q. You talked in your article a little bit about how AI is going to cause professional realignment and how it is going to cause these realignments for medical specialties. Do you see any similar kinds of realignments for support services specifically like PACS administrators? I'm thinking in terms of currently PACS administrators are responsible for maintaining the documentation of maintenance, upgrades, tests of modalities and viewing stations. Do you think that PACS administrators are going to have some additional responsibilities in terms of monitoring version updates and all of the different things that happen to AI as its developed because it's developed over time.

A. Oh, yeah, absolutely. So, you know, I think one of the key things to recognize when what folks are thinking about plugging in an AI system is that this is not a purchase of a thing. This is a relationship that's going to happen over time and or that's going to evolve over time as the AI system develops. Some of them are going to be “set it and forget it” and nothing ever changes, but honestly those will themselves tend to degrade over time as practices change, as patient populations change, so that's problematic. The systems that are going to perform best are going to be the ones that keep learning as they take in new data about patients about ongoing practices as they continue to learn over time.

Managing that learning process and making sure that that learning process itself maintains high quality standards and ensuring that the learning process comes with the kind of continuing feedback necessary for it to work, that's going to be a task for somebody and PACS managers very well may take on part of that role.

This is not about just procurement. This is about relationships with the technology that's going to keep changing and I think the tools necessary to manage that relationship will also be developing over time. 

No comments:

Post a Comment