AI bots have been acing medical school exams, but should they become your doctor?

AI tools like ChatGPT could transform healthcare by helping more people access care but there are considerable caveats.
Written by Rajiv Rao, Contributing Writer
Robot hand holding stethoscope
Getty Images/Kilito Chan

Recently, a digital Rubicon of sorts was crossed in the healthcare field that has inspired wonder and loathing, and even some fear.

Google launched a number of health initiatives but none attracted nearly as much attention as the updating of its medical large learning model (LLM) called Med-Palm, that was first introduced last year.

Also: These astronauts are getting their medical training from playing video games

LLMs as you may know are a type of artificial intelligence that are fed vast amounts of data -- like the entire contents of the pre-2021 internet in the case of the wildly popular ChatGPT. Using machine learning and neural networks, they are able to spit out confident answers to questions in fractions of a second that are eerily human-like.

Bot MD 

In the case of Med-Palm and its successor Palm 2, the health-focused LLM was fed a strict diet of health-related information and was then made to take the U.S. Medical Licensing Examination, or USMLE, a scourge of aspirant doctors and anxious parents. Consisting of three parts and requiring hundreds of hours of cramming, these exams are notoriously difficult.

Yet, Med-Palm 2 smashed it out of the park, performing at an "expert" doctor level with a score of 85% -- 18% higher than its predecessor -- and undoubtedly making its software coding parents preen at the pub that night.

Also: How to use ChatGPT: What you need to know now

Its peer, the generalist LLM ChatGPT, only scored at or near the passing threshold of 60% accuracy from its generalist data set, not a dedicated health one -- but that was last year. It's hard to imagine subsequent versions not acing the exam in the near future.

Biased bots and human prejudice

Yet, not everyone is convinced that these newly minted medical prodigies are good for us. 

Only a few months ago, Google suffered a humiliating setback when its newly born bot, Bard, after a grand unveiling incorrectly answered a basic question about a telescope, hacking off $100 billion in market capitalization.

The mishap has stoked a continuing debate about the accuracy of AI systems and its impact on society.

Also: ChatGPT vs. Bing Chat: Which AI chatbot should you use?
An increasing concern is how racial bias tends to proliferate amongst commercial algorithms used to guide healthcare systems. In one infamous situation, an algorithm within the US healthcare system assigned the same risk to Black patients who were far sicker than White ones, reducing their number selected for extra care by more than half.

From emergency rooms to surgery and preventive care, the human tradition of prejudice against women, elderly and people of color -- essentially, the marginalized -- has been efficiently foisted upon our machine marvels.

Ground realities in a broken system

And yet, the healthcare system is so profoundly broken in the US, with at least 30 million Americans without insurance and tens of millions struggling to access basic care, that being concerned about bias may be an ill-afforded luxury.

Take teenagers, for instance. They tend to suffer a lot, negotiating obesity and puberty in the early years, and sexual activity, drugs and alcohol in subsequent ones.

Also: What is Auto-GPT? Everything to know about the next powerful AI tool

In the ten years preceding the pandemic, sadness and hopelessness among teens including suicidal thoughts and behaviors increased by 40% according to the Centers for Disease Control and Prevention's (CDC).

"We're seeing really high rates of suicide and depression, and this has been going on for a while," said psychologist Kimberly Hoagwood, PhD, a professor of child and adolescent psychiatry at New York University's Grossman School of Medicine. "It certainly got worse during the pandemic."

Yet, statistics show that over half of teenagers do not get any mental healthcare at all today. From veterans -- at least twenty of whom take their own lives every day of the year -- to the elderly, to those who simply cannot afford the steep cost of insurance, or who have urgent medical needs but face interminably long waits, healthbots and even generalized AIs like ChatGPT can become lifelines.

Also: How to use the new Bing (and how it's different from ChatGPT)

Woebot, a popular health chatbot service, recently conducted a national survey which found that 22% of adults had availed of the services of an AI-fueled health bot. At least 44% said they had ditched the human therapist completely and only used a chatbot.

The doctor is (always) in

It is therefore easy to see why we have begun to look to machines for succor. 

AI health bots don't get sick, or tired. They don't take holidays. They don't mind that you are late for an appointment.

They also don't judge you like humans do. Psychiatrists, after all, are human, capable of being culturally, racially or gender biased just as much as anyone else. May people find it awkward to confide their most intimate details to someone they don't know.

Also: Future ChatGPT versions could replace a majority of work people do today

But are health bots effective? So far, there haven't been any national studies that can gauge their effectiveness but anecdotal information reveals something unusual taking place.

Even someone like Eduardo Bunge, the associate chair of psychology at Palo Alto University, an admitted skeptic of health bots, was won over when he decided to give a chatbot a go during a period of unusual stress.

"It offered exactly what I needed," he said. "At that point I realized there is something relevant going on here," he told Psychiatry Online.

Barclay Bram, an anthropologist who studies mental health, was going through a low phase during the pandemic and turned to Woebot for help, according to his editorial in the New York Times.

Also: ChatGPT is more like an 'alien intelligence' than a human brain

The bot checked in on him everyday and sent him gamified tasks to work through his depression.

The advice was borderline banal. Yet, through repeated practice urged on by the bot, Bram says he experienced a relief of his symptoms. "Perhaps everyday healing doesn't have to be quite so complicated," he said in his column.

'Hallucinating' answers

And yet, digesting the contents of the internet and spitting out an answer for a complex medical ailment, like what ChatGPT does, could prove calamitous.

To test ChatGPT's medical credentials, I asked it to help me out with some made-up ailments. First, I asked it for a solution to my nausea.

The bot suggested various things (rest, hydration, bland foods, ginger), and finally, over-the-counter-medications, such as Dramamine, followed by advice to see a doctor if symptoms were to worsen.

Also: AI could automate 25% of all jobs. Here's which are most (and least) at risk

If you had a thyroid problem, pressure in the eye (glaucoma patients suffer from this) or high blood pressure among a few other things, taking Dramamine could prove dangerous. Yet, none of these were flagged and there was no warning to check with a doctor first before taking the medication.

I then asked ChatGPT what "medications I should consider for depression." GPT was diligent enough to suggest consulting a medical professional first since it was not qualified to provide medical advice, but then listed several categories and types of serotonin-forming drugs that are commonly used to treat depression.
However, just last year, a landmark, widely-reported, comprehensive study that examined hundreds of other studies over decades for a link between depression and serotonin found no linkage at all between the two.

This brings us to the next problem with bots like ChatGPT -- the possibility that it may provide you with outdated information in a hyper-dynamic field like medicine. GPT has been fed data only up to 2021.

Also: How kids can use ChatGPT safely, according to a mom

The bot may have been able to crack the med school exams based on established, predictable content but it showed itself to be woefully  -- perhaps even dangerously -- out-of-date with new and important scientific findings.

And in places where it doesn't have any answers to your questions, it just makes them up. According to researchers from the University of Maryland School of Medicine who asked ChatGPT questions related to breast cancer, the bot responded with a high degree of accuracy.  Yet, one in ten were not just incorrect but often completely fabricated -- a widely observed phenomena called AI 'hallucinations.'

"We've seen in our experience that ChatGPT sometimes makes up fake journal articles or health consortiums to support its claims," said Dr. Paul Yi.

In medicine, this could sometimes be the difference between life and death.

Unlicensed to ill

All-in-all, it isn't so hard to predict LLMs path towards a giant legal firestorm if it can be proven that an anthropomorphizing bot's erroneous advice caused grievous bodily harm, whether it had a standard homepage disclosure or not.

There is also the specter of potential lawsuits chasing privacy issues. Duke University's Sanford School of Public Policy's recent investigative report by Joanne Kim revealed a whole underground market for highly sensitive patient data related to mental health conditions that was culled from health apps.

Also: Why your ChatGPT conversations may not be as secure as you think

Kim reported 11 companies that she found were willing to sell bundles of aggregated data that included information on what antidepressants people were taking.

One company was even hawking names and addresses of people who suffer from post-traumatic stress, depression, anxiety or bipolar disorder. Another sold a database featuring thousands of aggregated mental health records, starting at $275 per 1,000 "ailment contacts."
Once these make their way onto the internet and by extension AI bots, both medical practitioners and AI companies could expose themselves to criminal and class action lawsuits from livid patients.

Also: Generative AI is changing tech career paths. What to know
But until then, for the vast populations of the underserved, the marginalized and those looking for some help where none exists, LLM health chatbots are a boon and a necessity.
If LLM models are reined in, updated and given strict parameters for functioning in the health business, they could undoubtedly become the most invaluable tool that the global medical community has yet to avail of.
Now, if only they could stop lying.

Editorial standards