Introduction to Sql 04
contents
Up until now we've kept most of our queries relatively simple with a few being a little on the spicy side. However, what do we do when things start to become more complicated? Insted of writing a bunch of spaghetti, we'll learn about Subqueries, CTEs, and Temp tables, as well as how to determine which to use.
Subqueries
Subqueries are exactly what they sound like; queries within queries. Instead of simply querying a table, we're running a query to generate a set of results and then querying those results. Lets take a look at a basic example:
-- WHERE subquery
SELECT
order_id,
customer_id,
order_price_aud,
FROM orders
WHERE order_price_aud > (SELECT AVG(order_price_aud) FROM orders);
While still a perfectly valid use case, the above example doesn't demonstrate the real power of subqueries, so we'll try something a little harder. Don't worry if you don't fully understand what's going on, just make sure you understand the overall purpose of subqueries.
Imagine we wanted to find the orders for each fruit where the order price was greater than the average order price for that particular fruit. A mouth full, I know. For example, if the average order price for pineapples was $5000, then we want only the orders for pineapple with an order price greater than $5000, and likewise for the other fruits. To achieve this we can use the following subquery:
SELECT
orders.fruit_id,
fruits.fruit_name,
orders.order_id,
orders.order_price_aud,
ROUND(avg_prices.avg_order_per_fruit, 2)
FROM orders
JOIN (
SELECT
fruit_id,
AVG(order_price_aud) AS avg_order_per_fruit
FROM orders
GROUP BY fruit_id
) AS avg_prices
ON orders.fruit_id = avg_prices.fruit_id
JOIN fruits
ON avg_prices.fruit_id = fruits.fruit_id
WHERE orders.order_price_aud > avg_prices.avg_order_per_fruit;
Lets break down the above query piece by piece to understand what's happening:
SELECT
fruit_id,
AVG(order_price_aud) AS avg_order_per_fruit -- Average order price
FROM orders
GROUP BY fruit_id -- Grouping the average price by each fruit
The subquery component calculates the average order price for each fruit, and then groups over them. In the full query, we've aliased this table as avg_prices. When you produce a table using a subquery it's referred to as a derived table, however this table doesn't exist outside of the query.
Since we know our subquery creates a 'table' called avg_prices, we'll rewrite the original query as though this table actually exists.
SELECT
orders.fruit_id,
orders.order_id,
orders.order_price_aud,
ROUND(avg_prices.avg_order_per_fruit, 2)
FROM orders
JOIN [avg_prices]
ON orders.fruit_id = avg_prices.fruit_id
WHERE orders.order_price_aud > avg_prices.avg_order_per_fruit;
Now it's beginning to look strangely similar to the joins we learnt last lesson. In this part we're simply performing a join on the fruit_id column between our orders table and imaginary avg_prices table, and then filtering for rows with an order price greater than the fruit's average order price. To make our results easier to read, we then joined onto the fruits table and included the fruit_name column.
Common Table Expressions (CTEs)
CTEs are a far more recent addition to SQL compared to subqueries, replacing them for a lot of use cases. Subqueries still have their place, but when things get complicated its often better to whip out the CTEs.
When we used a subquery, we were basically inserting a query within another query. Another way to look at is in reverse; wrapping a query around another query. With CTE's however, it's more comparable to creating a table through a query, and then querying said table.
The general structure when working with CTEs is as follows:
-- Part 1
WITH cte_name AS (
[QUERY]
)
-- Part 2
SELECT
[Query involving CTE];
A cte is comprised of two parts. Part 1 is where we create the CTE (or as we'll see later, CTEs), while part 2 queries the CTE as though it were a real table. To understand better we'll take a look at a simple example below.
WITH red_yellow_fruit AS (
SELECT
*
FROM fruits
WHERE fruit_color IN ('Red', 'Yellow')
)
SELECT
fruit_name
FROM red_yellow_fruit
WHERE average_weight > 150 OR average_size = 'Medium';
In this example we have red_yellow_fruit as our CTE, which simply filters for all red and yellow fruit. Next we write a typical select statement, with the CTE being the table we're querying from.
Lets go back to the last subquery we created and rewrite it as a CTE:
WITH avg_prices AS (
SELECT
fruit_id,
AVG(order_price_aud) AS avg_order_per_fruit
FROM orders
GROUP BY fruit_id
)
SELECT
orders.fruit_id,
fruits.fruit_name,
orders.order_id,
orders.order_price_aud,
ROUND(avg_order_per_fruit, 2)
FROM orders
JOIN avg_prices
ON orders.fruit_id = avg_prices.fruit_id
JOIN fruits
ON orders.fruit_id = fruits.fruit_id
WHERE orders.order_price_aud > avg_prices.avg_order_per_fruit;
You might notice how the second part of our CTE is identical to the subquery we wrote when pretending avg_prices existed as an actual table. Instead of pretending, with CTEs we can actually include avg_prices as if it were a normal table!
Subqueries vs CTEs
Deciding between the use of Subqueries or CTEs can be a little confusing at first. People often debate online over subqueries being outdated, or CTEs being a more complicated version of subqueries. Truth is, each option has it's own use case. Below are a few of their differences:
- A CTE can be called/referenced multiple times within a query, whereas a subquery can only be used once. This can lead to more condensed, easier to understand queries when queries become longer.
- Subqueries can be used in a WHERE clause, within the IN or EXISTS keywords. This isn't possible with CTEs.
- Subqueries can be used in UPDATE statements, within the SET keyword. Again, not possible with CTEs.
- CTEs can be used recursively while subqueries cannot. We'll cover this later.
In situations where both subqueries and CTEs are possible, subqueries should be used for shorter, single use cases, while CTEs should be used for more complex cases, or cases where the same subquery would need to be written multiple times. This is just a generalization; each situation is unique and you should make your own decisions as to which option seems most appropriate.
Temp Tables
As you might have guessed from the name, temp tables (short for temporary tables) are tables which exists only within the session they're created. Once the session ends, Any temp tables created are removed. One use of temp tables is to store intermediate results which can be used for querying later on. If you find yourself wanting to reuse a complex CTE or subquery multiple times, temp tables can be a good solution as they store/materialize the query results, avoiding repeated computation.
Temp tables can be created in practically the same way as a standard table as shown below:
CREATE TEMP TABLE temp_table_name (
integer_column INT,
string_column1 VARCHAR(50),
string_column2 VARCHAR(20),
float_column FLOAT
);
While this works fine, a common way of creating a temp table (and tables in general) is through the use of CTAS.
CTAS
CTAS stands for 'create table as select', and is a technique used to create a table through querying an already existing table. Below is how we'd use CTAS to create a regular table.
CREATE TABLE table_name AS (
SELECT columns
FROM existing_table
WHERE [condition]
);
The same can be done for creating temp tables by including the TEMP keyword:
CREATE TEMP TABLE temp_table_name AS (
SELECT columns
FROM existing_table
WHERE [condition]
);
When you're exploring large datasets and need to work with a subset of data for further analysis, using CTAS to create a temp table can be a good idea as it saves both time and money.
Summary
In this lesson we learnt about subqueries, CTEs, and temp tables, as well as the use of CTAS in creating both regular tables and temp tables.
In the next lesson we'll finish off the course by learning about window functions, allowing us to perform complex operations over table rows which are in some way related, while avoiding aggregation of data.
Feedback
Please spare a few seconds to leave some feedback, whether its positive or negative. Your feedback helps to improve the quality of lessons over time, and every bit of feedback matters.