HOME    ABOUT    NEWS    JOB BANK     EVENTS    CONTACT

 

A Data Analyst’s Guide to Using
ChatGPT

 

Author: Rebecca Duncan, PhD, MPH

Regardless of your profession, artificial intelligence will almost certainly impact the work you do in some way. While AI has provoked some well-deserved anxiety, AI tools can also be leveraged to improve efficiency and productivity. This is especially true if you regularly analyze data. 

There are many types of publicly available AI tools; however, not all are user-friendly, and many require expensive subscriptions. ChatGPT has the distinct advantage of being free and very easy to use (note: the paid version of ChatGPT now offers advanced data analytics tools; however, this article will cover the free version only). Anyone can use ChatGPT without prior training; you simply type your question or request into the box and press enter. It’s a lot like using Google, but the responses it provides can be tailored to your specific needs. 

ChatGPT has earned notoriety for its increasing role in ethical breaches, particularly in higher education. However, there are many ways to use ChatGPT that do not constitute plagiarism or other forms of ‘cheating.’ ChatGPT can be especially helpful with statistical analysis, particularly if you are new to the field or branching out into new methodologies.

1. Choosing the most optimal statistical tests/models for your data

Even seasoned data scientists can sometimes neglect to consider the basic assumptions of the statistical methods they have chosen. And while Google and other search engines are excellent resources, it is often challenging to find answers to data-specific questions. Unlike a search engine, ChatGPT can be fed the parameters and goal of the analysis and return a list of suggestions. If you aren’t familiar with a specific method or if you don’t understand a part of ChatGPT’s response, you can ask follow-up questions, request that the AI explain it to you in a different way, or ask for more detail. You can even ask for it to provide specific examples to better illustrate concepts.

Debugging code

Some statistical analysis software packages include a built-in debugger, but others do not. If you are using a program that does not include a debugger, such as SAS, finding coding errors can be immensely time-consuming and frustrating. The good news is that ChatGPT is not only conversant in over 80 natural languages - it can also understand and code in most programming languages. 

While asking ChatGPT to code for you is not advisable and arguably crosses an ethical line, ChatGPT is excellent at examining code for errors. And it is equally helpful for finding typos (which, as any coder knows, can completely derail a program) as it is for finding more complex errors, such as issues with syntax. If for some reason its first suggestion does not fix the problem, you can continue to dialogue with ChatGPT and ask for other potential solutions. You can also ask what impact any changes to your code might have on the overall program. 

2. Interpreting results

When working with large or complex datasets, sometimes the results of your analysis may throw you for a loop. If a test is not performing as expected, you can input the results to ChatGPT and ask it for an interpretation. This is an especially useful feature if you are working on a project by yourself and need someone to bounce ideas off of. ChatGPT is also very helpful at explaining how statistical tests arrive at their results, so if you are trying a new method that you are not yet familiar with, asking ChatGPT for help is a bit like having your own private tutor. 

3. Coping with various challenges

Anyone who analyzes data knows that the process rarely goes completely as planned. This is especially true when you are in the exploratory phase of analysis. You may have specific questions about your dataset that are difficult to generalize for a Google search. Issues with data extraction, missingness, and data cleaning are very common, and the appropriate solutions are not always intuitive. Again, using ChatGPT like you would a colleague or mentor can save hours of frustration. 

While ChatGPT can be an extremely effective tool for the data analyst, several words of caution are in order. First, always double check any response that seems off or not quite right. While some tasks are better suited to an AI than a human being, the truth is that no AI can yet fully replace human expertise. ChatGPT should always be used as a supplement to other more reliable sources of information. Second, do not ask ChatGPT to generate original content. While ChatGPT has many excellent qualities, one disadvantage is that it will sometimes output nonsensical responses, particularly if you ask it to generate code. It is likely that this ability will improve over time as the AI is trained on ever larger datasets, but for now, it is strongly recommended that you not use ChatGPT for this purpose. If you do request lines of code, proceed with caution.

Third, if you are working with sensitive or proprietary data, as many of us in the health sciences do, make sure that your use of ChatGPT does not in any way violate your data use agreement. Never input sensitive data directly into any AI tool unless you have received explicit permission to do so. Once information is entered into ChatGPT, it is retained in the model memory in perpetuity, so always be cautious with what you share. 

Do you have an idea for an EpiMonitor article?


We love epidemiology, biostatistics and public health and welcome thoughtful and timely contributions to the field. A review of our past newsletters is the best gauge for the type of content we publish.

Please submit your full article as a Word document; submissions should be 800-1000 words. Please include who you are, your current affiliation, and any relevant background, including your qualifications to write on your chosen topic. Conflicts of interest—current or potential, financial or favor—must be disclosed. We read all submissions; if your submission is selected, you will receive an email from our Research Director.

Contact madeline@epimonitor.net to set up an email Q&A, or you can submit for consideration an article about your work.

 

 

HOME    ABOUT    NEWS    JOB BANK     EVENTS    CONTACT