AdobeStock_455007340

How To Generate SQL Statements with ChatGPT

ChatGPT is all the rage right now, and understandably so. Feeding it prompts and watching it spit out quality articles, recipes, travel itineraries, code examples, and more is fun and addictive. Yep, it’s a rabbit hole, and one we’re all getting a kick out of.

One of ChatGPT’s many neat tricks is its ability to generate SQL statements; you simply tell ChatGPT about your tables, verify that it understands the structure properly, and then ask for what you’d like returned, all in simple English. It works, and really well. And so, in this post I’ll demonstrate how to take advantage of this magical capability using tables (well, a subset of the tables) and examples from my Sams Teach Yourself SQL in 10 Minutes.

Ok, here goes. Obviously, you’ll need to log in to ChatGPT (creating an account if you’ve not yet done so).

The first thing you need to do is tell ChatGPT about your tables. Here’s the prompt I used:

Customers table contains columns cust_id, cust_name. Orders table contains columns order_num, order_date, cust_id. Vendors table contains columns vend_id, vend_name. Products table contains columns prod_id, vend_id, prod_name, prod_price. OrderItems table contains columns order_num, prod_id, quantity, item_price.

As you can see, I simply told ChatGPT the table names and the columns in each. (There are actually more columns in my tables; for brevity’s sake I opted to just list the ones needed here). So, what did ChatGPT do with this info? This is how it responded:

It appears you are providing the structure of multiple database tables in a relational database, possibly SQL. These tables are Customers, Orders, Vendors, Products, and OrderItems.
* Customers table: It has customer information, each identified by a unique cust_id and associated with a cust_name.
* Orders table: It records individual orders, each associated with a unique order_num, order_date, and linked to a customer through cust_id.
* Vendors table: It holds vendor information, each identified by a unique vend_id and a vend_name.
* Products table: It has product information. Each product has a unique prod_id, a prod_name, and a prod_price. Each product is linked to a vendor through vend_id.
* OrderItems table: This table details the items in each order, identified by order_num and prod_id. It also has the quantity of each product and the item_price in the order.
Please let me know what specific query or operation you would like to perform on this database structure.

Ok. Take a deep breath, and … read that response. Seriously, that is mind-boggling! Not only did ChatGPT understand the tables I enumerated, but it also figured out all the table relationships. It worked out how Orders relate to Customers, and Products to OrderItems, and so on. I didn’t define any of them, it worked it which columns in which tables were related to others. It helps that the column name conventions I used were consistent, but still, wow!

Now to see how well ChatGPT generates SQL. This was my next prompt:

Return a list of customers with a count of orders for each.

ChatGPT worked for a few seconds and returned this:

To return a list of customers along with a count of their orders, you could perform a SQL query that JOINs the Customers and Orders tables on the cust_id field, and then groups by the customer name or ID while counting the number of orders. Here's an example of how you might do this:
SELECT C.cust_id, C.cust_name, COUNT(O.order_num) AS order_count
FROM Customers C
LEFT JOIN Orders O ON C.cust_id = O.cust_id
GROUP BY C.cust_id, C.cust_name
ORDER BY order_count DESC;
This query works by joining the Customers and Orders tables together where the cust_id fields match. For each customer, it counts the number of associated orders (order_num entries). The GROUP BY clause groups the results by both cust_id and cust_name to provide a count for each customer. The ORDER BY clause sorts the result by the count of orders in descending order so customers with the most orders come first.
Please note that if there are customers who haven't made any orders, they will also appear in the list with an order count of 0, because of the LEFT JOIN operation. If you only want customers who have made at least one order, you could use an INNER JOIN instead.

Well, yeah, that worked, ChatGPT returned a SELECT statement that correctly joins the two needed tables so as to return customers and an order count for each. I love that it also added a note explaining that it has used a LEFT JOIN in order to include customers with 0 orders, and to change that to an INNER JOIN if that is not what is desired.

Impressive, right? But what about DBMS specific SQL? The best way to try that is to use dates (date manipulation functions tend to be notoriously DBMS specific). Here’s the prompt I used:

Select customers who have made an order in the past 90 days with total order amount.

This seems simple, right? Nope! There is actually no way to generate this SQL for every major DBMS, the functions needed to calculate 90 days from a specific date really are very DBMS specific. Here’s what ChatGPT returned:

Sure, you can create a query that joins the Customers, Orders, and OrderItems tables to calculate the total order amount for each customer in the past 90 days. Here is an example SQL query:
SELECT 
    C.cust_id, 
    C.cust_name, 
    SUM(OI.item_price * OI.quantity) AS total_order_amount
FROM 
    Customers C
INNER JOIN 
    Orders O ON C.cust_id = O.cust_id
INNER JOIN 
    OrderItems OI ON O.order_num = OI.order_num
WHERE 
    O.order_date >= CURDATE() - INTERVAL 90 DAY
GROUP BY 
    C.cust_id, 
    C.cust_name
HAVING 
    total_order_amount > 0
ORDER BY 
    total_order_amount DESC;
This query starts by joining the Customers, Orders, and OrderItems tables together. It uses the WHERE clause to filter out any orders that weren't placed in the last 90 days. It then calculates the total order amount for each customer by multiplying the item price by the quantity (OI.item_price * OI.quantity) and summing these values. Finally, it uses the HAVING clause to include only customers who have a total order amount greater than 0, and orders the results by the total order amount in descending order.
Please note that the date comparison in the WHERE clause might slightly vary depending on the specific SQL dialect used. The query assumes the order_date column is of a date type that can be directly compared with CURDATE() - INTERVAL 90 DAY.

ChatGPT returns a valid response, and notes that the WHERE clause might slightly vary depending on the specific SQL dialect used. Yeah, no kidding! So, what DBMS will this work with? Let’s ask ChatGPT:

What DBMS can I use with this SQL statement?

ChatGPT responds with:

The SQL statement before the SQL Server-specific one is written in a fairly standard SQL dialect and should work on several different database management systems (DBMS), with potentially minor modifications due to slight differences in SQL syntax and function naming. Some of these DBMS include:
* MySQL: It's probably the most compatible with this statement, as the original statement appears to be written in a MySQL-friendly syntax.
* PostgreSQL: This DBMS also uses standard SQL syntax similar to that used in the statement. However, the CURDATE() function should be replaced with CURRENT_DATE.
* MariaDB: As a fork of MySQL, MariaDB maintains a high level of compatibility with MySQL's SQL syntax.
* SQLite: While SQLite uses a variant of SQL, the query might require minor modifications. For instance, SQLite does not support the INTERVAL keyword, so the date subtraction part would need to be adjusted.
Please note that Oracle, SQL Server, and some other DBMS might require more significant modifications due to differences in function naming and SQL syntax.

Turns out ChatGPT generated syntax that would work with MySQL, a reasonable place to start, and also noted the tweaks needed use these statements with PostgreSQL and SQLite. And, yes, it also noted that Oracle, SQL Server, and some other DBMS might require more significant modifications. Ok, so let’s ask ChatGPT to do just that:

Give me a version of that SELECT statement that will work with Microsoft SQL Server

And the response is:

Microsoft SQL Server has some differences in syntax and function naming compared to other SQL dialects. Here's how you might write the same query in SQL Server:
SELECT 
    C.cust_id, 
    C.cust_name, 
    SUM(OI.item_price * OI.quantity) AS total_order_amount
FROM 
    Customers C
INNER JOIN 
    Orders O ON C.cust_id = O.cust_id
INNER JOIN 
    OrderItems OI ON O.order_num = OI.order_num
WHERE 
    O.order_date >= DATEADD(day, -90, GETDATE())
GROUP BY 
    C.cust_id, 
    C.cust_name
HAVING 
    SUM(OI.item_price * OI.quantity) > 0
ORDER BY 
    total_order_amount DESC;
The GETDATE() function in SQL Server returns the current date and time, and DATEADD(day, -90, GETDATE()) subtracts 90 days from the current date. This replaced CURDATE() - INTERVAL 90 DAY from the original query.
Another change is in the HAVING clause. SQL Server requires the condition in the HAVING clause to contain an aggregate function, it does not allow aliases to be used as they are not yet defined at the time when the clause is processed. Thus, I replaced the total_order_amount alias with the expression it stands for: SUM(OI.item_price * OI.quantity).

You get the idea. ChatGPT does a remarkable job generating SQL so long as:

  • Start by defining your tables. Use plain English, the less fussy the better.
  • Verify that ChatGPT has correctly understood your table structure. I it hasn’t (this can happen if column names are less consistent) then add a prompt explaining the required table relationships.
  • Then prompt for the SQL statements by clearly defining the desired output (in English, not SQL).
  • It’s best to tell ChatGPT what DBMS you are using.

And there you have it, SQL generation powered by ChatGPT!

Oh, BTW, that image at the top left, the robot database thingy? That’s also generated by an AI, specifically Adobe Firefly. Seemed fitting for this post.

12 responses to “How To Generate SQL Statements with ChatGPT”

  1. Aviva Avatar
    Aviva

    Awesome! ‘News I can use’. I will be putting this to work for me immediately on a current project. Thank you thank you Ben!

  2. Ben Forta – ChatGPTでSQLステートメントを生成する方法 – 世界の話題を日本語でザックリ素早く確認!

    […] この記事はHackerNewsに掲載された下記の記事およびそれに対するHackerNews上のコメントを元に作成されています。Ben Forta – How to Generate SQL Statements with ChatGPT […]

  3. naquad Avatar
    naquad

    I’ve had some success with even shorter and more pseudocode syntax prompts. Sort of:

    There are the following table in PostgreSQL 13 DB:

    accounts(id, name)
    orders(id, account_id)
    product(id, sku, price, description, stock int)
    orders_products(order_id, product_id, count)

    generate a query selecting all orders that have items where stock = 0, group them by account_id and calculate the top accounts making orders for the low stock products

    1. BenForta Avatar
      BenForta

      Nice, thanks for sharing!

  4. Futureseek Daily Link Review; 13 July 2023 | Futureseek Link Digest

    […] How To Generate SQL Statements with ChatGPT >> * AI tools are designing entirely new proteins that could transform medicine >> * ChatGPT is […]

  5. 如何使用ChatGPT生成SQL语句 – 偏执的码农

    […] 详情参考 […]

  6. Ben Forta – How to Generate SQL Statements with ChatGPT – Veritas Reporters

    […] Read More […]

  7. Jeff Moden Avatar
    Jeff Moden

    Nice article but look at what you typed in your “So long as” and understand why I’m saying what I’m saying below…

    While I’m impressed with the technology, it appears to have saved no time and, in fact, may have cost you time because of the rhetorical nature of the responses.

  8. Data Visualization with ChatGPT – Ben Forta

    […] week I posted an introduction to using ChatGPT to generate SQL statements. One reader asked me about data visualization, and, yes, that is indeed possible with ChatGPT, too. […]

  9. 实用!SD人物表情提示词合辑;秒变大神的Python速查表;开源LLM微调训练指南;LLM开发必修课 | ShowMeAI日报 – AI 資訊

    […] 最后!一定一定要注意!ChatGPT自动生成的 SQL 语句并不能保证 100% 准确,需要经过验证 ⋙ 阅读原文 […]

  10. Emery Berger Avatar
    Emery Berger

    Nice post. Thought you might be interested – we developed an open source extension called SQLwrite that builds natural language to SQL functionality into SQLite – been out for four months now. https://github.com/plasma-UMass/sqlwrite.

  11. Paul Kogan Avatar
    Paul Kogan

    Thanks Ben, super helpful. Question: is any one modela (gpt3.5 turbo etc) is better than others for code generation.

Leave a Reply

Discover more from Ben Forta

Subscribe now to keep reading and get access to the full archive.

Continue reading