|
||||||||||
|
||||||||||
A
Reader's Response to: |
||||||||||
Author:
Jessica M. Keralis, PhD, MPH To the editor:
As a research
epidemiologist with 15 years of data analysis, management, and
visualization experience, I was profoundly disappointed to see the
article “A Data Analyst’s Guide to Using ChatGPT” in the August issue
of The Epidemiology Monitor. So much of what we as
epidemiologists do is informed and guided by data analysis. The idea
that it is possible to “leverage” AI to “improve efficiency and
productivity” of what should be a careful and purpose-driven activity
is alarming, to say the least. When LLMs are given input, they tokenize the text to convert it to a machine-readable format and then use those tokens to form a probability distribution that tells them which words are likely to come before, and after, other words. Then, when provided with a prompt, the model samples from that same probability distribution to generate text. OpenAI, the organization that developed ChatGPT, explains the process as follows: OpenAI's large language models…process text using tokens, which are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens. For natural (spoken) language, that generated text “sounds” human because its probability distribution was developed by digesting massive amounts (terabytes) of human-written text. However, the model (in this case, ChatGPT) does not “understand” questions that it is asked of it or “search” the internet for answers. It tokenizes the question, and then it samples from a probability distribution to generate the answer that statistically “looks like” it would follow that question. There is no insight or understanding involved. The process is similar when ChatGPT “analyzes” data. When a user provides a dataset to ChatGPT and asks the model to analyze it, ChatGPT generates Python code to import the data and generate the output requested by the user. However, the model writes the Python code using the same process it uses to produce a written response in natural language: it generates coding syntax using a probability distribution it developed from tokenizing many millions of lines of Python code fed to it from all over the internet. ChatGPT has a reputation for being useful for coding tasks because programming syntax has more consistent rules and fewer terms than spoken language. This makes the structure of the syntax easier to replicate. ChatGPT can be useful for identifying syntax errors in code that will not run properly. It can also suggest code for complex data management tasks, such as loops to import data from multiple identically formatted data tables (think of a series of Excel workbooks). In such cases, it is straightforward for an analyst to confirm that the suggested correction results in their code running properly, or that the data import was done correctly. However, the model has been shown to be unreliable for guidance on software programming challenges. This analysis conducted by researchers at Purdue University found that ChatGPT produces wrong answers to software programming questions more than half the time. More importantly, the model is incapable of truly analyzing data. “Analyze” is defined as “examine methodically and in detail the constitution or structure of (something, especially information), typically for purposes of explanation and interpretation.” ChatGPT generates code to do things with the user’s data based on a probability distribution. It cannot examine, understand, explain, or interpret it. Data analysis, particularly for epidemiologic or other public health applications, should always be done with a goal in mind, or a question to answer. For research, the choice of statistical test or modeling approach should be based on a theoretical framework and informed by the existing literature and subject matter expertise. For public health practice applications, the analytic approach should be developed based on the need of the end user. Interpreting results requires background knowledge and statistical training. A probability engine cannot do any of those things. To trust one to try is irresponsible at best. I found the suggestion that one can use ChatGPT “like you would a colleague or mentor” very troubling, particularly given the tendency of these models to “hallucinate” and give demonstrably false information. I would never advise a junior scientist to treat a statistically-driven LLM like a senior expert who has decades of career wisdom and insight from lived experience. I understand that high-quality data analysis is difficult to do, and that the learning curve can be steep and frustrating. Even now, I remember that frustration well. I was a junior analyst once, too. However, the best way for an analyst to improve their efficiency and productivity is to get better at it through practice. Build your expertise by seeking guidance from humans and human-written (and verified) guidance. Using LLMs as an aide - one that has a high probability of leading you astray - only delays the development of skills. The work we dedicate our careers to as epidemiologists is complex, nuanced, and incredibly high-stakes. We outsource it to machines at not only our peril, but also the peril of those whom our careers are meant to serve – the people. ■
|
||||||||||
|
||||||||||