HOME    ABOUT    NEWS    JOB BANK     EVENTS    CONTACT

 

A Reader's Response to:
A Data Analyst’s Guide to Using
ChatGPT

 

Author: Jessica M. Keralis, PhD, MPH

A Note from the EpiMonitor Staff: Last month we featured a reprint of an article titled “A Data Analyst’s Guide to Using ChatGPT,” which generated some discussion. By way of background, the majority of EpiMonitor readers are epidemiology academics and as such, our working assumption has been that most people reading this newsletter are familiar with LLMs, and that the way they work (i.e., that they are based on predictive language/text) is fundamentally incompatible with requirements for principled data analysis. Based on this assumption, we found the article helpful in outlining possible uses of LLMs in data analysis, primarily as a tool for syntax correction. We took the general intent of the piece to suggest that ChatGPT and LLMs can be a tool in an analyst's toolbox, not a replacement for the analyst.

We received a letter to the editor indicating this may not have been clear enough and which discusses in detail some of the pitfalls and misconceptions of LLMs that need to be considered by those in our field. We have chosen to print the unedited letter this month as it presents another perspective that will likely resonate with this audience.


To the editor:

As a research epidemiologist with 15 years of data analysis, management, and visualization experience, I was profoundly disappointed to see the article “A Data Analyst’s Guide to Using ChatGPT” in the August issue of The Epidemiology Monitor. So much of what we as epidemiologists do is informed and guided by data analysis. The idea that it is possible to “leverage” AI to “improve efficiency and productivity” of what should be a careful and purpose-driven activity is alarming, to say the least.

In order to properly frame this issue, it is important to understand how large language models (LLMs) like ChatGPT work. It is a common misconception that LLMs function as sophisticated search engines that “search” for the answer to a user’s question within their inputs and summarize it in plain language. Their actual mechanism is nothing like a search engine. These models cannot think, reason, interpret, or even search. They digest and generate language (either spoken language or coding language) using a probability distribution that they develop unsupervised, based on the inputs that they are fed.

When LLMs are given input, they tokenize the text to convert it to a machine-readable format and then use those tokens to form a probability distribution that tells them which words are likely to come before, and after, other words. Then, when provided with a prompt, the model samples from that same probability distribution to generate text. OpenAI, the organization that developed ChatGPT, explains the process as follows:

OpenAI's large language models…process text using tokens, which are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.

For natural (spoken) language, that generated text “sounds” human because its probability distribution was developed by digesting massive amounts (terabytes) of human-written text. However, the model (in this case, ChatGPT) does not “understand” questions that it is asked of it or “search” the internet for answers. It tokenizes the question, and then it samples from a probability distribution to generate the answer that statistically “looks like” it would follow that question. There is no insight or understanding involved.

The process is similar when ChatGPT “analyzes” data. When a user provides a dataset to ChatGPT and asks the model to analyze it, ChatGPT generates Python code to import the data and generate the output requested by the user. However, the model writes the Python code using the same process it uses to produce a written response in natural language: it generates coding syntax using a probability distribution it developed from tokenizing many millions of lines of Python code fed to it from all over the internet. ChatGPT has a reputation for being useful for coding tasks because programming syntax has more consistent rules and fewer terms than spoken language. This makes the structure of the syntax easier to replicate.

ChatGPT can be useful for identifying syntax errors in code that will not run properly. It can also suggest code for complex data management tasks, such as loops to import data from multiple identically formatted data tables (think of a series of Excel workbooks). In such cases, it is straightforward for an analyst to confirm that the suggested correction results in their code running properly, or that the data import was done correctly. However, the model has been shown to be unreliable for guidance on software programming challenges. This analysis conducted by researchers at Purdue University found that ChatGPT produces wrong answers to software programming questions more than half the time. More importantly, the model is incapable of truly analyzing data. “Analyze” is defined as “examine methodically and in detail the constitution or structure of (something, especially information), typically for purposes of explanation and interpretation.” ChatGPT generates code to do things with the user’s data based on a probability distribution. It cannot examine, understand, explain, or interpret it.

Data analysis, particularly for epidemiologic or other public health applications, should always be done with a goal in mind, or a question to answer. For research, the choice of statistical test or modeling approach should be based on a theoretical framework and informed by the existing literature and subject matter expertise. For public health practice applications, the analytic approach should be developed based on the need of the end user. Interpreting results requires background knowledge and statistical training. A probability engine cannot do any of those things. To trust one to try is irresponsible at best. I found the suggestion that one can use ChatGPT “like you would a colleague or mentor” very troubling, particularly given the tendency of these models to “hallucinate” and give demonstrably false information. I would never advise a junior scientist to treat a statistically-driven LLM like a senior expert who has decades of career wisdom and insight from lived experience.

I understand that high-quality data analysis is difficult to do, and that the learning curve can be steep and frustrating. Even now, I remember that frustration well. I was a junior analyst once, too. However, the best way for an analyst to improve their efficiency and productivity is to get better at it through practice. Build your expertise by seeking guidance from humans and human-written (and verified) guidance. Using LLMs as an aide - one that has a high probability of leading you astray - only delays the development of skills. The work we dedicate our careers to as epidemiologists is complex, nuanced, and incredibly high-stakes. We outsource it to machines at not only our peril, but also the peril of those whom our careers are meant to serve – the people.  

     
 

Dear Dr. Keralis,

I admit I was a bit surprised that my article was interpreted as advocating for the replacement of human data analysts with ChatGPT. Such was not my intention, nor do I dispute your detailed explanation of how LLMs work, their strengths and weaknesses, and the biases they can potentially perpetuate.

However, I would like to offer some perspective. I am a former American literature professor. I went to graduate school when the Internet was still relatively new, and there was much handwringing over students using Wikipedia and Google search to help write their papers. Yet today, I cannot imagine an educator asking their students not to use search engines for research.

By the same token, I do not think we have anything to fear from junior epidemiologists turning to ChatGPT for help debugging code, or suggesting possible reasons why their model isn’t behaving as expected. And to your last point, let’s not forget that the public health workforce was decimated by the pandemic. I wrote this post with the goal of helping my early career peers, most of whom do not have access to a supportive senior epidemiologist to mentor them but are nonetheless expected to do the important work of public health.

Is ChatGPT a perfect tool? Of course not. Will it replace human analysts? Not anytime soon. But as my own mentor is fond of saying, don’t let the perfect become the enemy of the good.

Respectfully,

Heather Duncan, MPH, PhD

 
     

 

HOME    ABOUT    NEWS    JOB BANK     EVENTS    CONTACT