Public Health
‘Dr. Google’ Meets Its Match: Dr. ChatGPT

As a fourth-year ophthalmology resident at Emory University School of Medicine, Riley Lyons’ biggest responsibilities include triage: When a patient comes in with an eye-related complaint, Lyons must make an immediate assessment of its urgency.
So, when two of Lyons’ fellow ophthalmologists at Emory came to him and suggested evaluating the accuracy of the AI chatbot ChatGPT in diagnosing eye-related complaints, he jumped at the chance.
The relative proficiency of ChatGPT, which debuted in November 2022, was a surprise to Lyons and his co-authors. The artificial intelligence engine “is definitely an improvement over just putting something into a Google search bar and seeing what you find,” said co-author Nieraj Jain, an assistant professor at the Emory Eye Center who specializes in vitreoretinal surgery and disease.
But the findings underscore a challenge facing the health care industry as it assesses the promise and pitfalls of generative AI, the type of artificial intelligence used by ChatGPT: The accuracy of chatbot-delivered medical information may represent an improvement over Dr. Google, but there are still many questions about how to integrate this new technology into health care systems with the same safeguards historically applied to the introduction of new drugs or medical devices.
The smooth syntax, authoritative tone, and dexterity of generative AI have drawn extraordinary attention from all sectors of society, with some comparing its future impact to that of the internet itself. In health care, companies are working feverishly to implement generative AI in areas such as radiology and medical records.
The Emory study is not alone in ratifying the relative accuracy of the new generation of AI chatbots. A report published in Nature in early July by a group led by Google computer scientists said answers generated by Med-PaLM, an AI chatbot the company built specifically for medical use, “compare favorably with answers given by clinicians.”
AI may also have better bedside manner. Another study, published in April by researchers from the University of California-San Diego and other institutions, even noted that health care professionals rated ChatGPT answers as more empathetic than responses from human doctors.
Indeed, a number of companies are exploring how chatbots could be used for mental health therapy, and some investors in the companies are betting that healthy people might also enjoy chatting and even bonding with an AI “friend.” The company behind Replika, one of the most advanced of that genre, markets its chatbot as, “The AI companion who cares. Always here to listen and talk. Always on your side.”
“We need physicians to start realizing that these new tools are here to stay and they’re offering new capabilities both to physicians and patients,” said James Benoit, an AI consultant. While a postdoctoral fellow in nursing at the University of Alberta in Canada, he published a study in February reporting that ChatGPT significantly outperformed online symptom checkers in evaluating a set of medical scenarios. “They are accurate enough at this point to start meriting some consideration,” he said.
Still, even the researchers who have demonstrated ChatGPT’s relative reliability are cautious about recommending that patients put their full trust in the current state of AI. For many medical professionals, AI chatbots are an invitation to trouble: They cite a host of issues relating to privacy, safety, bias, liability, transparency, and the current absence of regulatory oversight.
The proposition that AI should be embraced because it represents a marginal improvement over Dr. Google is unconvincing, these critics say.
The biggest danger, in his view, is the likelihood that market incentives will result in AI interfaces designed to steer patients to particular drugs or medical services. “Companies might want to push a particular product over another,” said Marks. “The potential for exploitation of people and the commercialization of data is unprecedented.”
OpenAI, the company that developed ChatGPT, also urged caution.
“OpenAI’s models are not fine-tuned to provide medical information,” a company spokesperson said. “You should never use our models to provide diagnostic or treatment services for serious medical conditions.”
John Ayers, a computational epidemiologist who was the lead author of the UCSD study, said that as with other medical interventions, the focus should be on patient outcomes.
“If regulators came out and said that if you want to provide patient services using a chatbot, you have to demonstrate that chatbots improve patient outcomes, then randomized controlled trials would be registered tomorrow for a host of outcomes,” Ayers said.
He would like to see a more urgent stance from regulators.
“One hundred million people have ChatGPT on their phone,” said Ayers, “and are asking questions right now. People are going to use chatbots with or without us.”
At present, though, there are few signs that rigorous testing of AIs for safety and effectiveness is imminent. In May, Robert Califf, the commissioner of the FDA, described “the regulation of large language models as critical to our future,” but aside from recommending that regulators be “nimble” in their approach, he offered few details.
This article was produced by KFF Health News, which publishes California Healthline, an editorially independent service of the California Health Care Foundation.