ChatGPT and rare disease insights: the role of synthetic respondents in bridging gaps in research

By Gemma McConnell, Research Director & Oncology Expert at Day One

Would you be able to identify a synthetic respondent from a real-life respondent?

Artificial Intelligence (AI) has gained significant attention recently with its applications stretching across many aspects of market research. One application that attracted our attention was the potential value offered by synthetic respondents in rare disease patient research. Why rare disease patient research? Because of the unique challenges associated with conducting rare disease research: difficulty in finding patients (especially within specified time frames and budgets) and the consequent smaller sample sizes which sometimes call into question the robustness of insights gathered.

What is a synthetic respondent?

A synthetic respondent uses artificial intelligence to generate responses to questions. This computer-generated response draws upon multiple data points and uses complex predictive algorithms to generate a response that, in theory, mirrors a ‘real’ answer.

We asked the question, what value, if any, can synthetic respondents add to rare disease patient research? Given that my interest and specialism lie in oncology, and that cancers typically represent a rare disease, we used the Acute Myeloid Leukemia (AML) patient journey as a case study; comparing the responses from two real AML patient interviews to that generated by ChatGPT (a freely available AI tool), when asked ‘to simulate an AML patient’.

The findings, that I showcased to a room full of pharma focused insight professionals at the European Pharmaceutical Market Research Association (EphMRA) Meeting in Basel on 21st September, were both intriguing and unexpected.

During the presentation, I shared the response from one of the real AML patient interviews and the synthetic response generated from ChatGPT when asked the same question. Both responses were voiced over by avatars to keep the audience guessing. I then asked the audience to identify which was the real patient – and the audience was completely divided.

From the transcripts below, can you identify the real respondent?

There was an audible shock and surprise that reverberated throughout the room when I revealed the real patient…patient number 1. Minds were blown. The response generated by ChatGPT has such a high level of detail and comparable content to that of the real AML patient, with similar ‘emotions’ and struggles identified, that it is difficult to identify the artificial from the real.

I also shared two more responses to another question, again one from an AML patient and the other generated by ChatGPT. A key difference this time, was that the real patient was a less ‘typical’ AML patient, initially diagnosed very young. In this instance, the synthetic response was far easier to identify; it lacked context and felt much more generic. But when ChatGPT was provided with more context, i.e., in this case, ‘now respond as a 26-year-old female diagnosed with AML at age 11…’, the response was more contextually appropriate and harder to distinguish from the real patient.

Are we saying that we can rely solely on synthetic respondents?

Absolutely not. We know there are clear limitations to using ChatGPT as synthetic data – unfortunately, ChatGPT can fabricate the truth and, of course, every patient is unique, is in a unique situation, and has a unique story to tell. While we have found it possible for ChatGPT to more appropriate contextualise responses, it is difficult to know which factors, amongst the myriad making up each individual’s context, that are important to input into ChatGPT to deliver more accurate outputs. In addition, compared to an in-person or video interview, synthetic responses do not capture body language, depth of emotion or tone of voice. Each patient offers a unique perspective and narrative which is integral to the research process.

Considering the challenges surrounding rare disease patient research, can ChatGPT offer any value?

Through our self-funded rare disease patient research, using AML as a case study, we have found that ChatGPT, acting as a synthetic respondent, can deliver fairly accurate insights into the AML patient journey – the fine line in identifying real and artificial responses left the EphMRA audience astonished.

We are not suggesting solely relying on ChatGPT, but how about using synthetic data to help prepare for research and get the most out of every patient interaction? Using ChatGPT as a time-efficient first step in building hypotheses to take forward and probe upon in primary market research?

It is not a question of using synthetic respondents as a replacement for real respondents, but rather a question of whether synthetic responses are a valuable tool in our research arsenal to enhance the depth of our primary market research?

As we continue to explore the role of AI in market research, it is exciting to see how we take AI forward to augment our abilities and, ultimately, enhance the quality and impact of our research efforts!

Abigail Stuart