
ChatGPT is all the rage right now, and understandably so. Feeding it prompts and watching it spit out quality articles, recipes, travel itineraries, code examples, and more is fun and addictive. Yep, it’s a rabbit hole, and one we’re all getting a kick out of.
One of ChatGPT’s many neat tricks is its ability to generate SQL statements; you simply tell ChatGPT about your tables, verify that it understands the structure properly, and then ask for what you’d like returned, all in simple English. It works, and really well. And so, in this post I’ll demonstrate how to take advantage of this magical capability using tables (well, a subset of the tables) and examples from my Sams Teach Yourself SQL in 10 Minutes.
Ok, here goes. Obviously, you’ll need to log in to ChatGPT (creating an account if you’ve not yet done so).
The first thing you need to do is tell ChatGPT about your tables. Here’s the prompt I used:
Customers table contains columns cust_id, cust_name. Orders table contains columns order_num, order_date, cust_id. Vendors table contains columns vend_id, vend_name. Products table contains columns prod_id, vend_id, prod_name, prod_price. OrderItems table contains columns order_num, prod_id, quantity, item_price.
As you can see, I simply told ChatGPT the table names and the columns in each. (There are actually more columns in my tables; for brevity’s sake I opted to just list the ones needed here). So, what did ChatGPT do with this info? This is how it responded:
It appears you are providing the structure of multiple database tables in a relational database, possibly SQL. These tables are Customers, Orders, Vendors, Products, and OrderItems.* Customers
table: It has customer information, each identified by a uniquecust_id
and associated with acust_name
.* Orders
table: It records individual orders, each associated with a uniqueorder_num
,order_date
, and linked to a customer throughcust_id
.* Vendors
table: It holds vendor information, each identified by a uniquevend_id
and avend_name
.* Products
table: It has product information. Each product has a uniqueprod_id
, aprod_name
, and aprod_price
. Each product is linked to a vendor throughvend_id
.
*OrderItems
table: This table details the items in each order, identified byorder_num
andprod_id
. It also has thequantity
of each product and theitem_price
in the order.
Please let me know what specific query or operation you would like to perform on this database structure.
Ok. Take a deep breath, and … read that response. Seriously, that is mind-boggling! Not only did ChatGPT understand the tables I enumerated, but it also figured out all the table relationships. It worked out how Orders relate to Customers, and Products to OrderItems, and so on. I didn’t define any of them, it worked it which columns in which tables were related to others. It helps that the column name conventions I used were consistent, but still, wow!
Now to see how well ChatGPT generates SQL. This was my next prompt:
Return a list of customers with a count of orders for each.
ChatGPT worked for a few seconds and returned this:
To return a list of customers along with a count of their orders, you could perform a SQL query that JOINs the Customers and Orders tables on the cust_id
field, and then groups by the customer name or ID while counting the number of orders. Here's an example of how you might do this:
SELECT C.cust_id, C.cust_name, COUNT(O.order_num) AS order_count FROM Customers C LEFT JOIN Orders O ON C.cust_id = O.cust_id GROUP BY C.cust_id, C.cust_name ORDER BY order_count DESC;
This query works by joining the Customers and Orders tables together where thecust_id
fields match. For each customer, it counts the number of associated orders (order_num
entries). TheGROUP BY
clause groups the results by bothcust_id
andcust_name
to provide a count for each customer. TheORDER BY
clause sorts the result by the count of orders in descending order so customers with the most orders come first. Please note that if there are customers who haven't made any orders, they will also appear in the list with an order count of 0, because of theLEFT JOIN
operation. If you only want customers who have made at least one order, you could use anINNER JOIN
instead.
Well, yeah, that worked, ChatGPT returned a SELECT statement that correctly joins the two needed tables so as to return customers and an order count for each. I love that it also added a note explaining that it has used a LEFT JOIN in order to include customers with 0 orders, and to change that to an INNER JOIN if that is not what is desired.
Impressive, right? But what about DBMS specific SQL? The best way to try that is to use dates (date manipulation functions tend to be notoriously DBMS specific). Here’s the prompt I used:
Select customers who have made an order in the past 90 days with total order amount.
This seems simple, right? Nope! There is actually no way to generate this SQL for every major DBMS, the functions needed to calculate 90 days from a specific date really are very DBMS specific. Here’s what ChatGPT returned:
Sure, you can create a query that joins theCustomers
,Orders
, andOrderItems
tables to calculate the total order amount for each customer in the past 90 days. Here is an example SQL query:
SELECT C.cust_id, C.cust_name, SUM(OI.item_price * OI.quantity) AS total_order_amount FROM Customers C INNER JOIN Orders O ON C.cust_id = O.cust_id INNER JOIN OrderItems OI ON O.order_num = OI.order_num WHERE O.order_date >= CURDATE() - INTERVAL 90 DAY GROUP BY C.cust_id, C.cust_name HAVING total_order_amount > 0 ORDER BY total_order_amount DESC;
This query starts by joining theCustomers
,Orders
, andOrderItems
tables together. It uses theWHERE
clause to filter out any orders that weren't placed in the last 90 days. It then calculates the total order amount for each customer by multiplying the item price by the quantity (OI.item_price * OI.quantity
) and summing these values. Finally, it uses theHAVING
clause to include only customers who have a total order amount greater than 0, and orders the results by the total order amount in descending order. Please note that the date comparison in theWHERE
clause might slightly vary depending on the specific SQL dialect used. The query assumes theorder_date
column is of a date type that can be directly compared withCURDATE() - INTERVAL 90 DAY
.
ChatGPT returns a valid response, and notes that the WHERE clause might slightly vary depending on the specific SQL dialect used. Yeah, no kidding! So, what DBMS will this work with? Let’s ask ChatGPT:
What DBMS can I use with this SQL statement?
ChatGPT responds with:
The SQL statement before the SQL Server-specific one is written in a fairly standard SQL dialect and should work on several different database management systems (DBMS), with potentially minor modifications due to slight differences in SQL syntax and function naming. Some of these DBMS include: * MySQL: It's probably the most compatible with this statement, as the original statement appears to be written in a MySQL-friendly syntax. * PostgreSQL: This DBMS also uses standard SQL syntax similar to that used in the statement. However, the CURDATE() function should be replaced with CURRENT_DATE. * MariaDB: As a fork of MySQL, MariaDB maintains a high level of compatibility with MySQL's SQL syntax. * SQLite: While SQLite uses a variant of SQL, the query might require minor modifications. For instance, SQLite does not support the INTERVAL keyword, so the date subtraction part would need to be adjusted. Please note that Oracle, SQL Server, and some other DBMS might require more significant modifications due to differences in function naming and SQL syntax.
Turns out ChatGPT generated syntax that would work with MySQL, a reasonable place to start, and also noted the tweaks needed use these statements with PostgreSQL and SQLite. And, yes, it also noted that Oracle, SQL Server, and some other DBMS might require more significant modifications. Ok, so let’s ask ChatGPT to do just that:
Give me a version of that SELECT statement that will work with Microsoft SQL Server
And the response is:
Microsoft SQL Server has some differences in syntax and function naming compared to other SQL dialects. Here's how you might write the same query in SQL Server:
SELECT C.cust_id, C.cust_name, SUM(OI.item_price * OI.quantity) AS total_order_amount FROM Customers C INNER JOIN Orders O ON C.cust_id = O.cust_id INNER JOIN OrderItems OI ON O.order_num = OI.order_num WHERE O.order_date >= DATEADD(day, -90, GETDATE()) GROUP BY C.cust_id, C.cust_name HAVING SUM(OI.item_price * OI.quantity) > 0 ORDER BY total_order_amount DESC;
TheGETDATE()
function in SQL Server returns the current date and time, andDATEADD(day, -90, GETDATE())
subtracts 90 days from the current date. This replacedCURDATE() - INTERVAL 90 DAY
from the original query.
Another change is in theHAVING
clause. SQL Server requires the condition in theHAVING
clause to contain an aggregate function, it does not allow aliases to be used as they are not yet defined at the time when the clause is processed. Thus, I replaced thetotal_order_amount
alias with the expression it stands for:SUM(OI.item_price * OI.quantity)
.
You get the idea. ChatGPT does a remarkable job generating SQL so long as:
- Start by defining your tables. Use plain English, the less fussy the better.
- Verify that ChatGPT has correctly understood your table structure. I it hasn’t (this can happen if column names are less consistent) then add a prompt explaining the required table relationships.
- Then prompt for the SQL statements by clearly defining the desired output (in English, not SQL).
- It’s best to tell ChatGPT what DBMS you are using.
And there you have it, SQL generation powered by ChatGPT!
Oh, BTW, that image at the top left, the robot database thingy? That’s also generated by an AI, specifically Adobe Firefly. Seemed fitting for this post.
Leave a Reply