LAS VEGAS — The healthcare industry was taken by storm by generative artificial intelligence products two years ago. The technology, which can understand and create new text and images, was viewed as the key to reducing administrative burden for physicians and medical workers by summarizing patient records, writing requests for prior authorization, communicating with patients and more.
More recently, the models underpinning generative AI are being woven into “AI agents,” a buzzword that’s quickly surpassed its predecessor as the darling of technologists in the healthcare space.
Using large language models, agentic AI can make complex decisions without human oversight, allowing it to operate relatively unsupervised. As such, AI agents have the potential to significantly reduce clinician burnout and save providers money by taking over mundane tasks, proponents say.
AI agents bubbled up in numerous conversations at the massive HIMSS health IT conference in Las Vegas this week, as startups and tech behemoths peddling the tools pitched them as an inexpensive digital workforce.
Among the crop is Notable, a company founded almost one decade ago to automate healthcare tasks. Now, Notable’s AI is live at more than 10,000 care sites across the U.S., including in systems like CommonSpirit Health and Intermountain, according to the company.
But agents run up against many of the same issues as other AI tools, including a lack of independent information about their efficacy and accuracy and unclear liability if things go wrong. As research shows AI continues to make errors in healthcare, such speed bumps could get in the way of adoption — no matter how desperately providers need fixes for the hefty documentation burden on their workforce.
Healthcare Dive caught up with Aaron Neinstein, Notable’s chief medical officer, on-site at HIMSS, where Neinstein shared why tech giants like Google and Salesforce are a step behind in agentic AI, how the industry is holding AI to an unfair standard and why he refuses to share the accuracy rates of Notable’s AI tools with potential clients.
This interview has been edited for clarity and brevity.
HEALTHCARE DIVE: What’s Notable’s elevator pitch?
AARON NEINSTEIN: The company is about doubling workforce productivity at a fraction of the cost.
If you look at how a patient gets care today, a very teeny part of that is in the visit interaction. Think about a cancer patient: They have to get a PET scan, they have to get a biopsy, they have to come in and see their oncologist, they have to get a chemotherapy infusion. Each of those steps requires an order, that the order be entered into the EHR, the staff to take the prior authorization and submit it into the payer portal, schedule that appointment, pre-appointment preparation, etc. The majority of healthcare is the execution of the care plan. And Notable is focused on all of those steps, with AI agents that perform those tasks on behalf of health systems.
Do you pitch all these services to providers as one package, or tailor agents based on clients’ needs?
More of the latter. I was in health IT at UC San Francisco for 15 years. I was pitched every product there, and I bought many of them. And that’s a pain for IT teams, managing lots and lots of vendors. The dream is having one vendor who can do a lot of different things for you. So we typically see people starting with one use case but expanding to others.
Does Notable integrate with providers’ electronic health records?
Yes. That’s one advantage of having done this for 10 years. You walk around [the HIMSS conference] and you see AI agents everywhere. It’s very validating for us that there are a lot of new companies trying to enter the space. But it takes a long time to develop relationships with EHR vendors and integrate into care teams’ workflows.
With large language models, it’s pretty easy for anybody to make a demo look cool. But that doesn’t mean you can actually deploy it into healthcare operations, because the integrations will kill it quickly.
Are there any AI agents Notable doesn’t currently offer that it would like to?
Yes. One is contact center. Contact centers are expensive, have high turnover rates and we have a lot of customers asking us about using agents here. We’re going live with some of these over the next few months.
And then prior authorizations and more nurse or care coordinator-type things, like pre- and post-procedure communication — those are two areas where we’re just starting to work with some existing customers that I think will become much more commonplace over the next year or two.
There’s a lot of competition for AI assistants right now. Smaller companies like Lumeris but also tech giants like Microsoft, Google and Salesforce. The market feels so new but already very saturated. How do you view competition in the space? What differentiates Notable here?
I am very happy that Microsoft and Salesforce are spending so much money teaching everybody about AI agents on our behalf. Their marketing budgets are more than we could ever dream of spending.
But realistically, if you look at those companies, what makes or breaks things in healthcare is getting into the nooks and crannies of the workflow. I do not worry about those companies’ ability to do that at a deep level.
Aaron Neinstein, Notable’s chief medical officer.
Permission granted by Notable
Go ask health systems. How many of them are using Google, Salesforce or Microsoft? Well, maybe not Microsoft, because it has Nuance. I used [Nuance’s ambient documentation tool] 15 years ago. I still have that microphone in a drawer somewhere. So Microsoft is probably best positioned because of the Nuance acquisition, but I have yet to see Google or Salesforce really get into care delivery.
AI adoption in healthcare is in a gray area, oversight-wise, after President Donald Trump overturned a Biden-era directive that agencies create regulatory standards. Instead, the president appears to be taking a deregulatory approach, with the goal of freeing U.S. developers to innovate. Is this hands-off approach a good thing or a bad thing for the healthcare sector?
There’s no easy answer to that one. The way I think about it is, it matters more than ever to work with partners you can trust, because there’s no easy answer coming from the government.
It’s like what the FDA does for pharmaceuticals. If this regulatory body says it’s safe, then I can trust it. And people were hoping that might happen for AI, that there would be this blessing from some third-party body that would then allow health systems to feel safe. That was probably a pipe dream anyway, but it seems much less likely now.
But either way — having a third party validate an algorithm is useless. You have to start with the workflow and build AI into it to know how it’ll perform.
That’s an interesting point, because we just saw one health AI standards group launch a registry to essentially do what you just said was useless.
We get asked by customers all the time — what’s the accuracy of your AI? And we won’t answer it. Because it’s a process. As you’re deploying an algorithm, you use the local dataset to take the base algorithm and train it to perform better for their environment, their workflow and their data.
My warning light to any organization who’s looking for AI tools — if a company says their AI is 99% accurate, it’s probably not true. And it doesn’t actually mean it’s going to help with outcomes. Nutrition labels for AI is a nice-sounding idea, but it’s not how I would make decisions about deploying AI.
I’m sure you can understand, though, why potential clients may want an accuracy rate, especially given the gap in holistic oversight here. There are also really no large-scale studies of these tools — we only know what the companies creating them tell us. And generally companies want to make themselves look good.
The question I usually ask organizations when they ask for that number is, ‘What’s your human performance today?’
Do you know how many people know the answer to that? Very few. Most people are comparing AI performance to a theoretical imagined target versus comparing it to their current performance.
So one thing we do with organizations as we get started is we actually measure their human performance. Because in some places, surprise surprise, the human performance is 50%, 60%, 70% accuracy. So then that becomes the target, rather than this imagined target of 99% accuracy. Let’s see if we can do better than how humans are doing today.
So I think it’s just constantly reminding people and reframing that the benchmark to beat is not perfection. It’s better than what we have today, which in most cases is pretty bad.
How do you build trust, given many stakeholders feel the healthcare industry is moving too quickly to implement AI?
When we deploy we’ll start with a human in the loop. So we have a human check the algorithm’s output. That builds trust. It reduces fear, because often people are afraid that the AI is coming for their job. It also helps us improve model performance. If you have a human in the loop early, you eventually get to a point where people say it’s no longer needed, usually after a few weeks or months.
So rather than setting it up as this war, like ‘is the AI good enough?’ and have it be a binary decision — because I think that’s the trap people fall into — let’s make this a process of earning trust and improving the performance over time.
Everybody’s bringing a sense of skepticism and hesitancy, but AI is going to become more and more normal in daily life. And we’re going to become more comfortable. And inherently, peoples’ guards will come down.