By Andrew Leonard, KFF Health News

As a fourth-year ophthalmology resident at Emory University School of Medicine, Riley Lyons’ biggest responsibilities include triage: When a patient comes in with an eye-related complaint, Lyons must make an immediate assessment of its urgency.

He often finds patients have already turned to “Dr. Google.” Online, Lyons said, they are likely to find that “any number of terrible things could be going on based on the symptoms that they’re experiencing.”

So, when two of Lyons’ fellow ophthalmologists at Emory came to him and suggested evaluating the accuracy of the AI chatbot ChatGPT in diagnosing eye-related complaints, he jumped at the chance.

In June, Lyons and his colleagues reported in medRxiv, an online publisher of health science preprints, that ChatGPT compared quite well to human doctors who reviewed the same symptoms — and performed vastly better than the symptom checker on the popular health website WebMD.

And despite the much-publicized “hallucination” problem known to afflict ChatGPT — its habit of occasionally making outright false statements — the Emory study reported that the most recent version of ChatGPT made zero “grossly inaccurate” statements when presented with a standard set of eye complaints.

The relative proficiency of ChatGPT, which debuted in November 2022, was a surprise to Lyons and his co-authors. The artificial intelligence engine “is definitely an improvement over just putting something into a Google search bar and seeing what you find,” said co-author Nieraj Jain, an assistant professor at the Emory Eye Center who specializes in vitreoretinal surgery and disease.

But the findings underscore a challenge facing the health care industry as it assesses the promise and pitfalls of generative AI, the type of artificial intelligence used by ChatGPT: The accuracy of chatbot-delivered medical information may represent an improvement over Dr. Google, but there are still many questions about how to integrate this new technology into health care systems with the same safeguards historically applied to the introduction of new drugs or medical devices.

The smooth syntax, authoritative tone, and dexterity of generative AI have drawn extraordinary attention from all sectors of society, with some comparing its future impact to that of the internet itself. In health care, companies are working feverishly to implement generative AI in areas such as radiology and medical records.

When it comes to consumer chatbots, though, there is still caution, even though the technology is already widely available — and better than many alternatives. Many doctors believe AI-based medical tools should undergo an approval process similar to the FDA’s regime for drugs, but that would be years away. It’s unclear how such a regime might apply to general-purpose AIs like ChatGPT.

“There’s no question we have issues with access to care, and whether or not it is a good idea to deploy ChatGPT to cover the holes or fill the gaps in access, it’s going to happen and it’s happening already,” said Jain. “People have already discovered its utility. So, we need to understand the potential advantages and the pitfalls.”

The Emory study is not alone in ratifying the relative accuracy of the new generation of AI chatbots. A report published in Nature in early July by a group led by Google computer scientists said answers generated by Med-PaLM, an AI chatbot the company built specifically for medical use, “compare favorably with answers given by clinicians.”

AI may also have better bedside manner. Another study, published in April by researchers from the University of California-San Diego and other institutions, even noted that health care professionals rated ChatGPT answers as more empathetic than responses from human doctors.

Indeed, a number of companies are exploring how chatbots could be used for mental health therapy, and some investors in the companies are betting that healthy people might also enjoy chatting and even bonding with an AI “friend.” The company behind Replika, one of the most advanced of that genre, markets its chatbot as, “The AI companion who cares. Always here to listen and talk. Always on your side.”

“We need physicians to start realizing that these new tools are here to stay and they’re offering new capabilities both to physicians and patients,” said James Benoit, an AI consultant. While a postdoctoral fellow in nursing at the University of Alberta in Canada, he published a study in February reporting that ChatGPT significantly outperformed online symptom checkers in evaluating a set of medical scenarios. “They are accurate enough at this point to start meriting some consideration,” he said.

A ’Band-Aid’ Solution

Still, even the researchers who have demonstrated ChatGPT’s relative reliability are cautious about recommending that patients put their full trust in the current state of AI. For many medical professionals, AI chatbots are an invitation to trouble: They cite a host of issues relating to privacy, safety, bias, liability, transparency, and the current absence of regulatory oversight.

The proposition that AI should be embraced because it represents a marginal improvement over Dr. Google is unconvincing, these critics say.

“That’s a little bit of a disappointing bar to set, isn’t it?” said Mason Marks, a professor and MD who specializes in health law at Florida State University. He recently wrote an opinion piece on AI chatbots and privacy in the Journal of the American Medical Association.

“I don’t know how helpful it is to say, ‘Well, let’s just throw this conversational AI on as a band-aid to make up for these deeper systemic issues,’” he said to KFF Health News.

The biggest danger, in his view, is the likelihood that market incentives will result in AI interfaces designed to steer patients to particular drugs or medical services. “Companies might want to push a particular product over another,” said Marks. “The potential for exploitation of people and the commercialization of data is unprecedented.”

OpenAI, the company that developed ChatGPT, also urged caution.

“OpenAI’s models are not fine-tuned to provide medical information,” a company spokesperson said. “You should never use our models to provide diagnostic or treatment services for serious medical conditions.”

John Ayers, a computational epidemiologist who was the lead author of the UCSD study, said that as with other medical interventions, the focus should be on patient outcomes.

“If regulators came out and said that if you want to provide patient services using a chatbot, you have to demonstrate that chatbots improve patient outcomes, then randomized controlled trials would be registered tomorrow for a host of outcomes,” Ayers said.

He would like to see a more urgent stance from regulators.

“One hundred million people have ChatGPT on their phone,” said Ayers, “and are asking questions right now. People are going to use chatbots with or without us.”

At present, though, there are few signs that rigorous testing of AIs for safety and effectiveness is imminent. In May, Robert Califf, the commissioner of the FDA, described “the regulation of large language models as critical to our future,” but aside from recommending that regulators be “nimble” in their approach, he offered few details.

In the meantime, the race is on. In July, The Wall Street Journal reported that the Mayo Clinic was partnering with Google to integrate the Med-PaLM 2 chatbot into its system. In June, WebMD announced it was partnering with a Pasadena, California-based startup, HIA Technologies Inc., to provide interactive “digital health assistants.” And the ongoing integration of AI into both Microsoft’s Bing and Google Search suggests that Dr. Google is already well on its way to being replaced by Dr. Chatbot.

KFF Health News is a national newsroom that produces in-depth journalism about health issues.

By Crystal Lindell, PNN Columnist

Would you rather get your medical advice from a human physician or a chatbot powered by artificial intelligence?

According to an unusual study recently published in JAMA, you’d be better off listening to the chatbot, which gave advice that was rated significantly better both in quality and empathy.

A team of researchers asked the chatbot -- called ChatGPT – 195 hypothetical questions a patient might ask that were posted online in a Reddit forum called "Ask a Doctor." A team of healthcare providers then compared the chatbot’s responses to those posted by actual physicians in the same forum.

The responses from ChatGPT were the clear winners. The evaluators preferred the chatbot responses over the physicians’ answers about 79% of the time. The average rating for the chatbot was "better than good," while the physicians overall were given a lackluster grade of "acceptable response."

The evaluators also gave the ChatGPT high marks for empathy, while the physicians’ responses were rated as only “slightly empathetic.”

One of the hypothetical questions involved a patient who accidentally swallowed a toothpick and wanted to know if they were at risk of dying. Here’s how the chatbot and a physician responded:

Physician Response:

If you've surpassed 2-6 (hours), chances are they've passed into your intestines. Which means it can't be retrieved easily.

Many people can swallow razorblades and toothpicks without issues. In case you develop stomach ache, then don't hesitate seeking out an emergency room, and remember to point out the swallowed tooth pick. For now, they won't be doing much.

Chatbot Response:

It's natural to be concerned if you have ingested a foreign object, but in this case, it is highly unlikely that the toothpick you swallowed will cause you any serious harm.

Toothpicks are made of wood, which is not toxic and will not cause poisoning. However, it is possible for a toothpick to cause injury if it becomes stuck in the throat or if it punctures the digestive tract. However, this is unlikely to happen with a dull, cooked toothpick that is only 2 cm long.

If you are experiencing any discomfort or symptoms such as abdominal pain, difficulty swallowing, or vomiting, it is important to contact a healthcare provider for further evaluation. However, if you are not experiencing any symptoms, it is safe to assume that the toothpick has passed through your digestive system, and you do not need to be concerned.

It's always a good idea to be cautious when consuming food and to remove any foreign objects before eating. It's understandable that you may be feeling paranoid, but try not to worry too much. It is highly unlikely that the toothpick will cause you any serious harm.

Which answer would you rather get as a patient? The evaluators preferred the chatbot’s response to the toothpick question, as they did for other questions about a head injury, a lump possibly being cancerous, and if getting bleach in your eye might cause blindness. The chatbot response to those questions also got high scores for empathy.

As a chronic pain patient, I have to admit I felt a little smug when I first saw the headlines about this research. My initial thought was, of course, the chatbot was more empathetic than the doctors! Doctors are often mean, rude and dismissive. I even wrote a column about using a weird trick to manage their egos.

Is this the beginning of the end for human doctors? The short answer is: No. This was a very narrow study with a number of limitations. The authors said they were only trying to look into the possibility of physicians using artificial intelligence (AI) to essentially ghostwrite responses to patients.

“Chatbots could assist clinicians when messaging with patients, by drafting a message based on a patient’s query for physicians or support staff to edit,” they said. “Such an AI-assisted approach could unlock untapped productivity so that clinical staff can use the time-savings for more complex tasks, resulting in more consistent responses and helping staff improve their overall communication skills by reviewing and modifying AI-written drafts.”

The possibilities for using AI in medical settings are only just emerging, and it could radically change how we think of healthcare and how doctors think of us.

One thing that might improve is how doctors and patients communicate. I know doctors are capable of giving more empathetic answers on their own. The problem is, in the United States at least, they usually just don’t have the time to do so. If AI can help, we should let it.

However, it should be noted that several authors of the study, which was led by researchers at the University of California San Diego, disclosed ties to the artificial intelligence industry, which means they may financially benefit from any attempts to sell AI to medical professionals.

Also, as the researchers acknowledge, the study doesn’t prove that chatbots are better doctors than actual doctors, just that their answers were better. Physicians would likely respond to medical questions differently in an office setting, as opposed to an online post.

Researchers also did not assess the chatbot responses for accuracy — which is a pretty big deal when it comes to medical care. It’s one thing to write an empathetic reply, it’s another to correctly diagnose and decide on a treatment.

That said, when comparing the various responses, the chatbot did seem to give similar advice as the doctors, which makes me think they were mostly accurate. That may not sound great, but consider how often doctors are wrong. Medical errors are one the leading causes of death in the U.S.

AI technology is rapidly improving, and it’s impossible to predict what it will be capable of in the coming years. But I do believe it’s going to radically change many aspects of our lives, including healthcare. Hopefully, it makes it better. But at this point, I’ll settle for not making it worse.

Crystal Lindell is a freelance writer who lives in Illinois. After five years of unexplained rib pain, Crystal was finally diagnosed with hypermobile Ehlers-Danlos syndrome. She and her fiancé have 3 cats: Princess Dee, Basil, and Goose. She enjoys the Marvel Cinematic Universe, Taylor Swift Easter eggs, and playing the daily word game Semantle.

Latest Stories

ChatGPT Is Replacing Dr. Google

A ’Band-Aid’ Solution

Chatbot ‘Doctor’ Preferred Over Real Ones

Physician Response:

Chatbot Response:

Pain News Network

PNN TOPICS