AdobeStock_455007340

Data Visualization with ChatGPT

Last week I posted an introduction to using ChatGPT to generate SQL statements. One reader asked me about data visualization, and, yes, that is indeed possible with ChatGPT, too. Or rather, NOW it is possible by using the new ChatGPT Code Interpreter.

As I write this, Code Interpreter is available to ChatGPT Plus users only. Once logged in to ChatGPT, click on GPT-4, and select Code Interpreter Beta to use this module (as seen in this image).

If you don’t see Code Interpreter listed:

  • Click on the … next to your profile on the bottom left.
  • Go to Settings.
  • Select the Beta Features screen.
  • Turn on Code Interpreter.
  • You should now see Code Interpreter as a model option.

Now you’re good to go.

Code Interpreter is incredible; truly, its capabilities are mind-boggling. It can work with uploaded files, manipulate data and images, and much more. It does this by generating and then executing Python code on-the-fly and then displaying the results. And we can take advantage of this functionality to perform data visualization.

The examples here use Fruit and Vegetable Prices data made available by the US Dept. of Agriculture (pro-tip, there’s a wealth of data you can freely play with at data.gov). If you want to follow along with the steps below, download the ALL FRUITS CSV file (or, be brave, go rogue and use the ALL VEGETABLES CSV file, or whatever data you want).

Once you’ve downloaded the data it can be uploaded to ChatGPT. When using the Code Interpreter you’ll see a + on the left of the prompt input box. Click the + and upload the data file. Here’s what ChatGPT displayed when I uploaded the ALL FRUITS data:

Great! You've uploaded a file named 'Fruit Prices 2020.csv'. Let's load this file and see what it contains. We'll use the pandas library, which is a powerful tool for data manipulation and analysis in Python.
The dataset contains the following columns:

Fruit: The name of the fruit.
Form: The form in which the fruit is sold (e.g., Fresh, Canned, Juice).
RetailPrice: The retail price of the fruit.
RetailPriceUnit: The unit of the retail price (e.g., per pound, per pint).
Yield: The yield of the fruit.
CupEquivalentSize: The size equivalent to a cup.
CupEquivalentUnit: The unit of the cup equivalent size (e.g., pounds, fluid ounces).
CupEquivalentPrice: The price equivalent to a cup.
How can I assist you further with this data?

ChatGPT parsed and processed the file and displayed a summary. If you are interested to see how it did this, click on Finished Working Show Work and you’ll see:

import pandas as pd

# Load the data
file_path = "/mnt/data/Fruit Prices 2020.csv"
data = pd.read_csv(file_path)

# Display the first few rows of the data
data. Head()
RESULT
                        Fruit    Form  RetailPrice RetailPriceUnit  Yield  \
0                      Apples   Fresh       1.5193       per pound   0.90   
1          Apples, applesauce  Canned       1.0660       per pound   1.00   
2      Apples, ready-to-drink   Juice       0.7804        per pint   1.00   
3  Apples, frozen concentrate   Juice       0.5853        per pint   1.00   
4                    Apricots   Fresh       2.9665       per pound   0.93   

   CupEquivalentSize CupEquivalentUnit  CupEquivalentPrice  
0             0.2425            pounds              0.4094  
1             0.5401            pounds              0.5758  
2             8.0000      fluid ounces              0.3902  
3             8.0000      fluid ounces              0.2926  
4             0.3638            pounds              1.1603  

ChatGPT generated Python to parse and process the data and is now ready to perform analysis and visualizations. Let’s start with something simple:

display descriptive statistics

ChatGPT gets to work (and, again, you can click to expand and see the Python it generated to do so) and displays:

Here are the descriptive statistics of the numerical columns:

RetailPrice:

The average retail price is approximately 2.62 (unit depends on RetailPriceUnit).
The minimum retail price is 0.36 and the maximum is 10.55.
The median retail price (50th percentile) is 1.87.
Yield:

The average yield is approximately 0.88.
The minimum yield is 0.46 and the maximum is 1.00.
The median yield (50th percentile) is 0.98.
CupEquivalentSize:

The average size equivalent to a cup is approximately 1.70 (unit depends on CupEquivalentUnit).
The minimum size equivalent to a cup is 0.12 and the maximum is 8.00.
The median size equivalent to a cup (50th percentile) is 0.36.
CupEquivalentPrice:

The average price equivalent to a cup is approximately 0.92.
The minimum price equivalent to a cup is 0.23 and the maximum is 3.07.
The median price equivalent to a cup (50th percentile) is 0.89.
These numbers provide a general understanding of the numerical data in your dataset. If you need more specific insights or have other types of analysis in mind, feel free to ask.

Nice. But can we visualize data using charts? I asked Code Interpreter to:

visualize the prices

Not the most useful or descriptive prompt, I know. And so ChatGPT responds with:

There are several ways we could visualize the prices:

1: Histogram: This can show us the distribution of prices.
2: Boxplot: This can give us an idea about the spread and skewness of prices.
3: Bar chart: If we want to compare the prices of different fruits, we could use a bar chart.
Which type of visualization would you prefer?

I respond with just:

3

ChatGPT needs a bit more info from me:

Alright, let's create a bar chart to compare the retail prices of different fruits. However, please note that some fruits may have multiple forms (like fresh, canned, juice, etc.).

Would you like to compare the prices for a specific form, or should I average the prices for each type of fruit regardless of form?

Obviously, I could have shortened this back-and-forth by being more descriptive with my prompts in the first place. That said, I replied with:

average

ChatGPT gets to work and then displays this:

Here's a horizontal bar chart showing the average retail prices for different fruits. It appears that "Raspberries" have the highest average retail price, while "Watermelon" has the lowest. Please note that these averages are calculated across all forms of each fruit (Fresh, Canned, Juice, etc.) included in the dataset.

If you need further analysis or visualization, feel free to ask!

That worked beautifully. Not only did ChatGPT chart the data as requested, but it also provided a summary of key findings. Again, to see exactly what ChatGPT did, you can choose to Show work. (You can save this Python code for use with your own applications if needed).

Let’s try another example, this type using a more fully formed prompt:

visualize average price differences based on form

Here’s the response:

Here's a horizontal bar chart showing the average retail prices for different forms of fruits. It appears that the 'Fresh' form has the highest average retail price, while the 'Canned' form has the lowest. These averages are calculated across all types of fruits included in the dataset for each respective form.

If you need further analysis or have other questions, feel free to ask!

I didn’t specify a chart type, so ChatGPT picked what it thought would work, in this case a horizontal bar chart.

One more example, this time a more complicated chart:

visualize difference in cost for all fruit that is in multiple forms
The chart displays the average retail price of fruits that are available in multiple forms. Each bar represents a fruit, and the different colors within each bar represent different forms of the fruit (Fresh, Canned, Juice).

As we can see, for most fruits, the fresh form tends to be more expensive than the canned or juice forms. There are a few exceptions, such as oranges, where the canned form seems to be more expensive.

Please let me know if you need further analysis or have other questions!

This is just the start of data visualization with ChatGPT, and hopefully this is enough to get you started. I’ll dig into more sophisticated data visualization in future posts.

Oh, the opening image of the robot charting fruit is also AI generated, this one is courtesy Adobe Firefly.

One response to “Data Visualization with ChatGPT”

  1. cfjedimaster Avatar
    cfjedimaster

    Query – why did it say raspberries has the highest average price when the chart shows it as figs?

Leave a Reply