Tutorials

SQL JOINS

We are going to look at three types of join that you will need to use at some time or another

  • INNER JOIN
  • LEFT JOIN
  • CROSS (cartesian) JOIN

You can just use "JOIN" instead of "INNER JOIN" as, by default, the INNER is implicit. Similarly, "LEFT JOIN" is actually "LEFT OUTER JOIN", but again, the "OUTER" is implicit.

At the end of this tutorial we will look at a sample application involving all three.

INNER JOIN

This is the most commonly used of the three and is used to get data from matching rows in different tables.
Let's look at examples from our database.

Problem: List pupils for house #3 showing house name and pupil names.

Input

Our table data. We want pupil names from the pupil table and the house name from the house table where pupil's houseID matches the id from the house table.

house                                                    pupil (extracted sample)            
+---------+------------+                                 +---------+-----------------+
| houseID | house_name |                                 | houseID | pupil_name      |
+---------+------------+                                 +---------+-----------------+
|       1 | Laker      |                                 |    ...  |  ...            |
|       2 | Grace      |                                 |       3 | Mary Sheldon    |
|       3 | Jardine    |                                 |       4 | Mary Blake      |
|       4 | Cowdrey    |                                 |       3 | Michael Grove   |
+---------+------------+                                 |       2 | Mary Whitehouse |
                                                         |       4 | Margaret Norton |
                                                         |    ...  |  ...            |
                                                         +---------+-----------------+

I have spent many years in programming help forums and all too often I have seen programmers using this method (pseudocode)

    select the records from one table 
    while (fetch next record)
        get key field value
        select record from other table to get name matching the key value
    endwhile

Never run queries inside loops, it drains resources! If you ever find yourself doing this then you need a JOIN.

Each pupil has their houseID as part of the data and each house has a houseID. This will be the "join condition".

Query

SELECT h.house_name                         -- define the table.columns you want to retrieve 
    , p.fname
    , p.lname
FROM house h                                -- define the tables to be joined
     INNER JOIN                             --    and on which columns
     pupil p ON h.houseID = p.houseID       --    they should be matched
WHERE h.houseID = 3                         -- define search criteria
ORDER BY h.house_name, p.fname              -- define the order of the sorted results 

A note on style and syntax. I could have wrtten this query on a single line as

select house.house_name, pupil.fname, pupil.lname from house, pupil where house.houseid = pupil.houseid and house.houseid = 3 order by house.house_name, pupil.fname;
  1. A common convention is upper case keywords and lower case table and column names
  2. Split the query over several lines. Not only is it more readable but when mysql reports an error it tells you on which line.
  3. Don't use "... FROM A,B WHERE ...", use explicit join syntax with "...FROM A INNER JOIN B ON...".
    • The comma syntax can be used only for INNER JOINs so leads to inconsistency
    • The structure of the query is clearer if you keep the join condition separate from the selection criteria
    • Explicit JOIN..ON syntax improves query performance
  4. Unless you have short table names use aliases for tables. Again for legibility but also the table name only occurs once. If you mis-spell or want to change it, it's easier than if it is repeated with every column. (Especially if your tablename is "residual_hypoglycaemic_oxidization_factors"). There are times when using an alias is required.

Results

There are six pupils with houseID=3 so six rows are returned, each with a pupil name and the matching house name

+------------+---------+-----------+
| house_name | fname   | lname     |
+------------+---------+-----------+
| Jardine    | Anne    | Bailey    |
| Jardine    | John    | Patterson |
| Jardine    | John    | Tully     |
| Jardine    | Mary    | Sheldon   |
| Jardine    | Michael | Grove     |
| Jardine    | Peter   | Appleby   |
+------------+---------+-----------+

We do not always use an INNER JOIN because we want to extract values from another table. Because they are used to match records, they are often used to perform a filtering function. In the next example we only want data from the pupil table but use the other tables to filter our results

Problem : Which pupils chose Maths?

From our schema, to link from the subject table to the pupil table we need to link via the choice table. So we join from pupil to choice using the pupilID column and from subject to choice using the subjectID column. This time we'll use a slightly different syntax. Because the column names we are joining on are the same in both tables, we can can use the USING keyword

SQL has many inbuilt functions that you can use. One of them is the CONCAT() function which, in this case, is concatenating the first name, a space and the last name into a single output field.

In the previous query we were searching for a numeric houseid. In this one we are searching for for the string value "Maths". String values must be inside quotes in SQL queries otherwise they are assumed to be column names.

SELECT CONCAT(p.fname, ' ', p.lname) as Pupil
FROM pupil p
  INNER JOIN choice c USING (pupilID)
  INNER JOIN subject s USING (subjectID)
WHERE s.subject = 'Maths'

Results

+-----------------+
| Pupil           |
+-----------------+
| David Powell    |
| Henry Irving    |
| John Tully      |
| Mary Whitehouse |
| Peter Appleby   |
| Wayne Jones     |
| William Smith   |
+-----------------+

Problem : How many pupils chose each subject?

Our subject table contains the following

+-----------+-----------+
| subjectID | subject   |
+-----------+-----------+
|         1 | English   |
|         2 | German    |
|         3 | Geography |
|         4 | Maths     |
|         5 | Economics |
|         6 | Biology   |
|         7 | Chemistry |
|         8 | Physics   |
|         9 | History   |
|        10 | Computing |
+-----------+-----------+

We need to count the number of records in choice that match each subject

Query:

SELECT s.subjectID
, s.subject
, COUNT(*) as total                   -- COUNT(*) counts the number of records
FROM subject s 
INNER JOIN choice c USING (subjectID)
GROUP BY s.subjectID                  -- GROUP BY to get the COUNT() for each subject

Results:

+-----------+-----------+-------+
| subjectID | subject   | total |
+-----------+-----------+-------+
|         1 | English   |    12 |
|         2 | German    |    11 |
|         3 | Geography |     6 |
|         4 | Maths     |     7 |
|         5 | Economics |    13 |
|         6 | Biology   |     6 |
|         7 | Chemistry |     6 |
|         8 | Physics   |    10 |
|         9 | History   |    10 |
+-----------+-----------+-------+

There are ten subjects but this query gives only nine totals. Earlier I said subjects could be chosen by none, one or many pupils and the way we query the table would be affected. The INNER JOIN only totals matching records, so what if there is no match? How do we see that "computing" was chosen by no pupils?

Enter the LEFT JOIN.

LEFT JOIN

We use a left join when we want to return all selected rows from the left table regardless of whether there is a matching value in the right table. (LEFT merely denotes its postion in the query statement. If you have "A LEFT JOIN B" then "A", to the left of the "JOIN", is the left table)

I should state that there are also RIGHT OUTER JOINS too. These are just like LEFT joins with the table roles reversed. In almost thirty years of writing SQL queries I have never needed to use one.

Using the same query as the previous problem, except with a LEFT JOIN instead of an INNER JOIN

SELECT s.subjectID, s.subject, COUNT(*) as total
FROM subject s 
LEFT JOIN choice c USING (subjectID)
GROUP BY s.subjectID

Results

+-----------+-----------+-------+
| subjectID | subject   | total |
+-----------+-----------+-------+
|         1 | English   |    12 |
|         2 | German    |    11 |
|         3 | Geography |     6 |
|         4 | Maths     |     7 |
|         5 | Economics |    13 |
|         6 | Biology   |     6 |
|         7 | Chemistry |     6 |
|         8 | Physics   |    10 |
|         9 | History   |    10 |
|        10 | Computing |     0 |    ------- now we get all subjects
+-----------+-----------+-------+

Where there is no matching row in the right table, a NULL value is returned in any columns selected from that table. Show the pupils taking maths or computing

SELECT s.subject
, CONCAT(p.lname, ', ', p.fname) as pupil
FROM subject s
        LEFT JOIN choice c USING (subjectID)
        LEFT JOIN pupil p USING (pupilID)
WHERE s.subject IN ('Maths','Computing')
ORDER BY subject, pupil

Results:

+-----------+------------------+
| subject   | pupil            |
+-----------+------------------+
| Computing | NULL             |
| Maths     | Appleby, Peter   |
| Maths     | Irving, Henry    |
| Maths     | Jones, Wayne     |
| Maths     | Powell, David    |
| Maths     | Smith, William   |
| Maths     | Tully, John      |
| Maths     | Whitehouse, Mary |
+-----------+------------------+

We can use the NULL value when we need to know if there are records in one table with NO matching record in another table. So to find subjects not chosen by any pupil

SELECT s.subject
FROM subject s
LEFT JOIN choice c USING
(subjectID)
WHERE c
.subjectID IS NULL

Results:

+-----------+
| subject   |
+-----------+
| Computing |
+-----------+

WHERE and LEFT JOINS

One last word on LEFT JOINs, and something I still forget myself sometimes.

Suppose we want a list of all pupils and we want to indicate those taking geography (subject id 3). The obvious query is

SELECT CONCAT(p.fname,' ',p.lname) as pupil_name, c.subjectID
FROM pupil p
  LEFT JOIN choice c ON p.pupilID = c.pupilID
WHERE c.subjectID = 3

Unfortunately, this gives just a list of those taking geography and not all pupils as expected.

+------------------+-----------+
| pupil_name       | subjectID |
+------------------+-----------+
| John Patterson   |         3 |
| Allan Blair      |         3 |
| Caroline Freeman |         3 |
| Emma Watson      |         3 |
| Anthony Bell     |         3 |
| Mary Whitehouse  |         3 |
+------------------+-----------+

You need to make non-null value conditions on the right table part of the join condition. If you put them in the WHERE clause, the LEFT JOIN behaves like an INNER JOIN.

SELECT CONCAT(p.fname,' ',p.lname) as pupil_name, c.subjectID
FROM pupil p
  LEFT JOIN choice c ON p.pupilID = c.pupilID AND c.subjectID = 3

Now we get the expected results

+------------------+-----------+
| pupil_name       | subjectID |
+------------------+-----------+
| John Watson      |      NULL |
| Peter Adamson    |      NULL |
| William Smith    |      NULL |
| Mary Sheldon     |      NULL |
| Anna Hamilton    |      NULL |
| Gearge Wilson    |      NULL |
| Henry Irving     |      NULL |
| John Patterson   |         3 |
| Allan Blair      |         3 |
| Adam Simms       |      NULL |
| Mary Blake       |      NULL |
| Anne Bailey      |      NULL |
| Caroline Freeman |         3 |
| David Powell     |      NULL |
| Emma Watson      |         3 |
| Michael Grove    |      NULL |
| John Williams    |      NULL |
| Wayne Jones      |      NULL |
| Anthony Bell     |         3 |
| John Tully       |      NULL |
| Mary Whitehouse  |         3 |
| Jane Morrison    |      NULL |
| Margaret Norton  |      NULL |
| Peter Appleby    |      NULL |
+------------------+-----------+

Which just leaves the cartesian join

CROSS JOIN

This is not a join you will you use very often (although you may use one accidentally, especially when using the FROM A,B syntax for your inner joins and forgetting to put the join condition in the WHERE clause) but it does have its practical uses, as we shall see in the sample application at the end of this tutorial. (Another practical use is generating data)

A cartesian join is produced by omitting the join condition a join. In this case, every row in the first table is joined with every row in the second. For example, we'll just count the number of rows returned rather than list them (for obvious reasons)

SELECT COUNT(*) as total FROM pupil, choice, subject

One problem with the above query is that anyone looking at it doesn't know if the join conditions are missing by accident or by design. As I said earlier, use explicit JOIN clauses and not the ... FROM A,B,C syntax, thus

SELECT COUNT(*) as total
FROM pupil
    CROSS JOIN choice
    CROSS JOIN subject

Result

+-------+
| total |
+-------+
| 19440 |     24 pupils * 81 choices * 10 subjects
+-------+

The headmaster came up with another couple of requests. He wants to know which teacher's pupils had the best average marks in the 2015 exams

SELECT
    CONCAT(t.fname,' ',t.lname) as teacher
   ,s.subject
   ,AVG(pcent) as ave
FROM result r 
    INNER JOIN pupil p USING (pupilid)
    INNER JOIN subject s USING (subjectid)
    INNER JOIN teacher_subject ts ON p.classid=ts.classid
                 AND r.subjectid = ts.subjectid
    INNER JOIN teacher t USING (teacherid)
WHERE schoolyear = 2015
GROUP BY t.teacherid
ORDER BY ave DESC
LIMIT 3

Results

+-----------------+-----------+---------+
| teacher         | subject   | ave     |
+-----------------+-----------+---------+
| Emma Dunn       | Geography | 91.3333 |
| Pauline Fforbes | Biology   | 88.0000 |
| Frank Morrisson | English   | 83.1111 |
+-----------------+-----------+---------+

The second request was more complicated. Mrs Beetsem-Daley's shop in the local village sells items of school uniform and sports clothing which the pupils can charge to their accounts. Their expenditure will be periodically totalled and invoices created. The invoice amounts will be added to their annual school fees. Extra tables would be required.

additional tables image © Barry Andrew 2016

Handling price changes

Item prices tend to change over time so there is a chance that the price could change between the pupil buying an item and the invoice being produced at the end of term. The invoice needs to use the price that was in effect at the time of the transaction and not the price now.

SELECT i.pupilid
 , i.invoice_no
 , i.inv_date
 , item_desc as description
 , COUNT(*) as qty
 , SUM(price) as cost
FROM invoice i
INNER JOIN item_issue iss USING (invoice_no)
INNER JOIN item USING (itemid)
INNER JOIN price p ON iss.itemid=p.itemid 
    AND iss.issue_date BETWEEN p.from_date AND p.to_date    -- effective price
GROUP BY i.invoice_no, item_desc

+---------+------------+------------+----------------+-----+--------+
| pupilid | invoice_no | inv_date   | description    | qty | cost   |
+---------+------------+------------+----------------+-----+--------+
|       1 |       1001 | 2015-07-31 | Rugby shirt    |   1 |  20.00 |
|       1 |       1001 | 2015-07-31 | Rugby shorts   |   1 |  18.00 |
|       1 |       1001 | 2015-07-31 | Trousers       |   2 |  77.00 |
|       2 |       1002 | 2015-07-31 | Boy's blazer   |   2 | 136.50 |
|       2 |       1002 | 2015-07-31 | Rugby shirt    |   2 |  44.00 |
|       3 |       1003 | 2015-07-31 | Girl's blazer  |   1 |  71.50 |
|       3 |       1003 | 2015-07-31 | Netball shirt  |   1 |  22.00 |
|       3 |       1003 | 2015-07-31 | Netball shorts |   2 |  31.50 |
|       3 |       1003 | 2015-07-31 | Skirt          |   1 |  42.00 |
+---------+------------+------------+----------------+-----+--------+

(This principle applies any time you need to produce historical reports or, for example, if you have hotel rooms with different prices for low, mid and high seasons.) Each item price record contains the dates from and to which the price applies. The current price is valid until 9999-12-31. When a new invoice is raised for a pupil, the pupil's issue records are updated with the invoice number.

NOTE: When storing dates (or dates and times) use the database's native DATE, DATETIME or TIMESTAMP column types (Date format yyyy-mm-dd). This is the format expected by the dozens of inbuilt datetime functions and can be compared and correctly sorted. Thus it ensures the the maximum functionality from the data. Other formats cannot be correctly compared and sorted or need converting before they can be used by the functions.

Sample Application

Beetsem-Daley also wants an input form, similar to his original spreadsheet, where he can view and edit the subjects chosen by each pupil. Pupils will be listed in alphabetical order within their houses. Checkboxes will be used to indicate pupils' choices.

As promised, the query used in this application uses all three types of join (inner, left and cross joins).

Sample subject choice form © Barry Andrew 2016


<?php
$cnx
= new mysqli('localhost','*******','*******','jointute'); // substitute
// your connection code

/**
* get the subjects and store in array for headings
*/
$subjects = array();
$res = $cnx->query("SELECT SUBSTRING(subject,1,4) FROM subject ORDER BY subject");
$subjectCount = $res->num_rows;
while (
$row = $res->fetch_row()) $subjects[] = $row[0];

/**
* get the pupil subject choices
* NOTE the cartesian (CROSS) join pupil-subject to get a row for each subject for each pupil
* then a LEFT JOIN to choice to see if each pupil/subject combination was chosen
*/

$sql = "SELECT h.house_name, CONCAT(p.fname,' ',p.lname), p.pupilID, s.subjectID, c.subjectID as chosen
FROM pupil p
CROSS JOIN subject s
INNER JOIN house h USING (houseID)
LEFT JOIN choice c USING (pupilID, subjectID)
ORDER BY h.house_name, p.fname, s.subject"
;
$choice_res = $cnx->query($sql);
?>
<html>
<head>
<meta http-equiv="pragma" content="no-cache">
<meta http-equiv="cache-control" content="no-cache">
<meta name="generator" content="PhpED Version 4.5 (Build 4513)">
<title>Sample application</title>
<meta name="author" content="Barry Andrew">
<meta name="creation-date" content="06/29/2008">
<style type='text/css'>
body, p, td {
font-family: verdana, sans-serif;
font-size : 10pt;
padding: 5px;
}
td {
background-color: #FFF;
}
td.house {
background-color: #999;
color: white;
font-size : 12pt;
font-weight : 600;
border: 1pt solid gray;
}
td.cb {
text-align: center;
}
th {
background-color: #EEE;
color: black;
font-size : 10pt;
font-weight : 600;
padding: 5px;
}
table {
background-color: #666;
}
</style>
</head>
<body>
<form method='POST' action='updateChoices.php'>
<!-- updateChoices.php is outside the scope of the tutorial and is left as an exercise for the reader -->

<table width='90%' cellspacing='1'>

<?php
$prevHouse
= '';
$prevPupil = '';
while (list(
$house, $pupil, $pid, $sid, $chosen) = $choice_res->fetch_row())
{
if (
$prevHouse != $house)
{
$span = $subjectCount+1;
if (
$prevPupil != '')
{
echo
'</tr>'."\n"; // end previous row
}
echo
"<tr><td class='house' colspan='$span'>House : $house</td></tr>\n";
echo
'<tr><th rowspan="2">Pupil</th><th colspan="'.$subjectCount.'">Subject</th></tr>'."\n";
echo
'<tr><th width="6%">' . join ('</th><th width="6%">', $subjects ) . '</th></tr>'."\n";
$prevHouse = $house;
$prevPupil = '';
}
if (
$prevPupil != $pupil)
{
if (
$prevPupil != '')
{
echo
'</tr>'."\n"; // end previous row
}
echo
"<tr><td>$pupil</td>\n";
$prevPupil = $pupil;
}
$checked = $chosen ? 'checked=checked' : '';
echo
"<td class='cb'><input type='checkbox' name='choice[$pid][$sid]' value='1' $checked></td>\n";

}
echo
'</tr>'."\n";
echo
"<tr><th colspan='$span'><input type='submit' name='btnSubmit' value='Save'></th></tr>\n";
?>
</table>
</form>
</body>
</html>

Next: UNIONS