Introduction to Sql 02
contents
- Creating Additional Tables
- Entity Relationship Diagram
- SQL Joins
- Combining What We've Learnt
- Summary
Whether you're managing a small database or working with large-scale enterprise systems, understanding SQL joins is essential. Below we'll explore the various types of SQL joins, understand how they're used, how each type differs, and then learn to apply them.
Creating Additional Tables
To perform joins we need multiple tables. We'll begin by creating and populating a table containing orders for our fruit in January.
CREATE OR REPLACE TABLE orders (
order_id INT, -- Unique identifier for each order (primary key)
customer_id INT, -- Foreign key linking to the customer table
fruit_id INT, -- Foreign key linking to the fruits table
order_date DATE,
order_price_aud DECIMAL(10,2)
);
INSERT INTO orders (order_id, customer_id, fruit_id, order_date, order_price_aud)
VALUES
(1, 1, 7, '2024-01-01', 2356.50),
(2, 3, 12, '2024-01-03', 3120.76),
(3, 8, 2, '2024-01-03', 4355.82),
(4, 4, 10, '2024-01-04', 2214.78),
(5, 6, 10, '2024-01-05', 2315.06),
(6, 2, 15, '2024-01-06', 5234.23),
(7, 8, 1, '2024-01-09', 2377.75),
(8, 9, 8, '2024-01-11', 3496.59),
(9, 9, 4, '2024-01-15', 2548.32),
(10, 5, 16, '2024-01-15', 2256.83),
(11, 2, 13, '2024-01-19', 7121.80),
(12, 8, 11, '2024-01-21', 5579.16),
(13, 1, 3, '2024-01-23', 3435.55),
(14, 8, 6, '2024-01-24', 2773.08),
(15, 10, 18, '2024-01-25', 4310.59),
(16, 2, 9, '2024-01-28', 7112.79),
(17, 4, 14, '2024-01-30', 2221.77),
(18, 13, 20, '2024-02-01', 7456.52),
(19, 6, 10, '2024-02-03', 3425.59),
(20, 8, 2, '2024-02-05', 28611.64),
(21, 3, 7, '2024-02-05', 5356.50),
(22, 9, 12, '2024-02-06', 12960.76),
(23, 4, 5, '2024-02-07', 1415.06),
(24, 13, 13, '2024-02-10', 5851.80),
(25, 3, 10, '2024-02-11', 1714.78),
(26, 5, 8, '2024-02-13', 4876.59),
(27, 8, 14, '2024-02-15', 6851.77),
(28, 10, 15, '2024-02-16', 3274.23),
(29, 1, 4, '2024-02-17', 4858.32),
(30, 6, 10, '2024-02-17', 8463.08),
(31, 2, 13, '2024-02-19', 7121.80),
(32, 8, 11, '2024-02-20', 5579.16),
(33, 1, 25, '2024-02-21', 3435.55),
(34, 8, 6, '2024-02-22', 2773.08),
(35, 10, 18, '2024-02-22', 4310.59),
(36, 5, 9, '2024-02-25', 7112.79),
(37, 4, 12, '2024-02-26', 2221.77),
(38, 2, 20, '2024-02-26', 7456.52),
(39, 6, 10, '2024-02-27', 3425.59),
(40, 9, 19, '2024-02-28', 3125.59);
We'll also create and populate a table containing information about customers:
CREATE TABLE customers (
customer_id INT, -- Unique identifier for each customer (primary key)
customer_name VARCHAR(50),
contact_number VARCHAR(15),
email_address VARCHAR(50),
city VARCHAR(30),
state VARCHAR(30),
country VARCHAR(30)
);
INSERT INTO customers (customer_id, customer_name, contact_number, email_address, city, state, country)
VALUES
(1, 'Fresh Fruits Ltd.', '0412 345 678', 'contact@freshfruits.com', 'Sydney', 'New South Wales', 'Australia'),
(2, 'Green Harvest Co.', '0446 234 567', 'info@greenharvest.com', 'Adelaide', 'South Australia', 'Australia'),
(3, 'Aussie Orchards Inc.', '+61 413 567 890', 'support@aussieorchards.com', 'Melbourne', 'Victoria', 'Australia'),
(4, 'Western Produce Ltd.', '0415 012 345', 'service@westernproduce.com', 'Perth', 'Western Australia', 'Australia'),
(5, 'Maple fruits co.', '0441 234 567', 'contact@maplefruits.com ', 'Montreal', 'Quebec', 'Canada'),
(6, 'Kiwi Orchards Ltd.', '0417 345 678', 'info@kiwiorchards.com', 'Auckland', 'Auckland', 'New Zealand'),
(7, 'Capital Fruits Ltd.', '+61 418 567 890', 'support@capitalfruits.com', 'Wellington', 'Wallington', 'New Zealand'),
(8, 'London fruit Traders', '0419 789 012', 'contact@londonfruit.com', 'London', 'Greater London', 'United Kingdom'),
(9, 'Midlands Fresh Ltd.', '+61 420 012 345', 'info@midlandsfresh.com', 'Birmingham', 'West Midlands', 'United Kingdom'),
(10, 'Manchester Fruit Co.', '0421 234 567', 'support@manchesterfruit.com', 'Manchester', 'Greater Manchester', 'United Kingdom'),
(11, 'Banana Central Ltd.', '0482 334 188', 'contact@bananacentral.com', 'Perth', 'Western Australia', 'Australia'),
(12, 'Global Fruit Co.', '0432 489 581', 'info@globalfruit.com', 'Berlin', 'Berlin', 'Germany');
Hint: Look at the two tables we created and make a note of the following (it'll be important for later):
- 7 is missing from the customer ids in the orders table
- The orders table contains 13 as a customer id, which isn't in the customers table
- The orders table contains 25 as a fruit id, which isn't in the fruits table
Our orders table contains a unique id for each order (order_id), and each order contains both a customer id and fruit id (We'll keep things simple for now and allow only one type of fruit per order). These id columns are referred to as 'keys'. In the orders table, order_id is the primary key, as it uniquely identifies each row in the table. Similarly, fruit_id is the primary key for the fruits table. In the context of the orders table however, the fruit_id column is whats known as a 'foreign key'.
Entity Relationship Diagram
To better understand the relationships between our tables, as well as the concepts of primary and foreign keys, we'll create whats known as an entity relationship diagram (ERD).
By adding foreign keys to the orders table we're able to map each order to a customer and fruit. Without this mapping we'd need to create one big table which would lead to data redundancy. (LINK TO GLOSSARY)
The ERD also shows us the relationship types between tables. The lines connecting our tables show we have two 'One-to-Many' relationships. A single customer can make multiple orders, however each order can only have one customer. Similarly, a single fruit can be in multiple orders, but not the reverse. Later on we'll see One-to-One and Many-to-Many relationships.
SQL Joins
Now that we have our tables and understand their relationship we'll begin learning about joins. As you can probably guess, joins are used to combine data from two or more tables through primary and foreign keys. Since our orders table contains the id columns from both the fruits and customers tables, we can combine our tables via joins.
The 3 most common joins used in SQL are:
- INNER JOIN
- FULL (OUTER) JOIN
- LEFT (OUTER) JOIN
RIGHT JOIN also exists, however we'll discuss its use case another time.
To perform a join, we use the following format:
SELECT columns
FROM table_1
JOIN table_2
ON table_1.column = table_2.column;
The above query will create an inner join, as JOIN can be used as a
INNER JOIN
The INNER JOIN is the most common join you'll come across in SQL. It retrieves only the records with matching values in the related column of both tables. Records without a match are excluded from the result. We'll start with a simple join between our orders and customers tables:
SELECT
*
FROM orders
JOIN customers -- JOIN is an INNER JOIN by default
ON orders.customer_id = customers.customer_id;
The query results contain 38 of the 40 rows from our orders table. Since the customers table doesn't include 13 as a customer_id, any rows from the orders table with customer_id = 13 are left out.
Going back to what we learnt in lesson one, We'll use DISTINCT to see which customers have made an order, and select only the customer_id and customer_name columns. Lastly, we'll order the results by the customer_id.
SELECT DISTINCT
orders.customer_id,
customers.customer_name
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id
ORDER BY customer_id;
Notice how when selecting customer_id, we also need to specify which table the column is from. This is because the column name customer_id is contained in both tables. It isn't essential to specify the table for customer_name, however it's good practice to include the table.
Next, we'll step things up a notch and include our fruits table by performing two inner joins. We'll select the columns order_id, order_date, customer_name, fruit_name, and order_price_aud.
SELECT
orders.order_id,
orders.order_date,
customers.customer_name,
fruits.fruit_name,
orders.order_price_aud,
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id
JOIN fruits
ON orders.fruit_id = fruits.fruit_id;
This time only 37 rows were returned, as there's an unknown fruit (fruit_id = 25) in the orders table.
FULL OUTER JOIN
Unlike the INNER JOIN, the FULL OUTER JOIN returns all rows regardless of a match. We'll perform the same query as seen above but with FULL OUTER JOIN:
SELECT
orders.order_id,
orders.order_date,
customers.customer_name,
fruits.fruit_name,
orders.order_price_aud,
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id
JOIN fruits
ON orders.fruit_id = fruits.fruit_id;
The above query returns a total of 44 rows, but why is this? Lets break down what's happening:
We begin with the orders table containing 40 rows and join on the customers table. Since the customers table contains three customers not seen in the orders table we'll have three extra rows, resulting in 43 rows. Next, we join to the fruits table. Our orders table doesn't contain coconut (fruit_id = 17), and so we'll add yet another row, giving us a total of 44 rows.
One benefit of the full outer join is that we don't lose any data. If we're combining multiple data sources but we're not exactly sure which data to keep and which data to drop, we can perform an outer join and remove any useless data later down the line. At the same time, outer joins also make it easy to see which records don't have a match. In our case, we can easily see which customers haven't made a purchase, as well as any fruits that havent been ordered.
LEFT JOIN
The LEFT JOIN is almost like a middle ground between the INNER JOIN and FULL OUTER JOIN. Perhaps we want to join our orders table, but we don't want to lose any rows regardless of what's going on in the customers or fruits tables. Just because we don't know what item was ordered, or exactly who ordered it, it wouldn't be wise to simply exclude an order from our query.
We'll perform the same query as the last but with a left join this time:
SELECT
orders.order_id,
orders.order_date,
customers.customer_name,
fruits.fruit_name,
orders.order_price_aud,
FROM orders
LEFT JOIN customers
ON orders.customer_id = customers.customer_id
LEFT JOIN fruits
ON orders.fruit_id = fruits.fruit_id;
Rows with an unknown customer or fruit are retained and our query returns all 40 rows.
Note: A single query can have more than one type of join.
RIGHT JOIN
Right joins are relatively uncommon and should be avoided unless necessary. Perhaps you're working with a preexisting query and need to join a new table, but you dont want to rewrite the query or lose any rows from the new table. In this situation, a right join would be appropriate. Another situation could be where you want to quickly switch the table of focus. Instead of rearranging a query, you could simply change LEFT to RIGHT. We'll alter the above query to include customers
Perhaps there was a change of plans and the focus is now the customers table, regardless of whether they made a purchase, and any customers not in this table are to be ignored. We can quickly alter our previous query as shown below:
SELECT
orders.order_id,
orders.order_date,
customers.customer_name,
fruits.fruit_name,
orders.order_price_aud,
FROM orders
RIGHT JOIN customers -- Simply change LEFT to RIGHT
ON orders.customer_id = customers.customer_id
LEFT JOIN fruits
ON orders.fruit_id = fruits.fruit_id;
CROSS JOIN
A cross join produces a cartesian product of two tables, meaning that every row from the first table is paired with every row from the second table. In simple terms, it shows you all possible combinations between two tables. We'll look at a simple example using just the customer_name and fruit_name columns:
SELECT
customers.customer_name,
fruits.fruit_name
FROM fruits
CROSS JOIN customers;
The query results in 240 rows, each a unique combination of customer and fruit.
Combining What We've Learnt
In this next part we'll combine everything we've learnt up until now. When creating complex queries, its always a good idea to build them up progressively. Lets figure out which 3 fruits have the most orders.
SELECT
fruits.fruit_name,
COUNT(orders.fruit_id) AS fruit_count,
FROM orders
JOIN fruits
ON orders.fruit_id = fruits.fruit_id
GROUP BY fruits.fruit_name
ORDER BY fruit_count DESC
LIMIT 3;
Which customers (including only those in the customers table) spent the most on the fruits watermellon, apple, kiwi, and pineapple? Use the fruit_name to filter, not fruit_id.
SELECT
customer_name,
SUM(orders.order_price_aud) AS total_spendings,
FROM orders
JOIN fruits
ON orders.fruit_id = fruits.fruit_id
JOIN customers
ON orders.customer_id = customers.customer_id
WHERE fruit_name IN ('Apple', 'Pineapple', 'Kiwi', 'Watermelon')
GROUP BY customers.customer_name
ORDER BY total_spendings DESC;
Which customers didn't purchase any fruit?
SELECT
customers.customer_name,
orders.customer_id as order_customers
FROM customers
LEFT JOIN orders
ON customers.customer_id = orders.customer_id
WHERE order_customers IS NULL;
How many kg of fruit (to two decimals) were purchased in each order for fruits with an average weight greater than 100g but less than 300g? Sort the results from highest to lowest. Your output should include the names of the fruits, their average weight (with the column renamed to include grams), as well as orders from unknown customers.
SELECT
customers.customer_name,
fruits.fruit_name,
ROUND(DIV0(orders.order_price_aud, fruits.price_per_kg), 2) AS kg_purchased,
fruits.average_weight AS average_weight_grams
FROM orders
JOIN fruits
ON orders.fruit_id = fruits.fruit_id
LEFT JOIN customers
on orders.customer_id = customers.customer_id
WHERE average_weight BETWEEN 150 AND 300
ORDER BY kg_purchased DESC;
Summary
In this lesson we learnt about the different joins used in SQL, as well as how we determine which join to use. We then finished off with a few challenging queries.
In the next lesson we'll learn how to fix up all of the inconsistencies we've seen in our tables, from ambiguous column names to unknown id values.