Tutorials

SQL SUB-QUERIES

I was back at the school again recently. The Headmaster, Mr Beetsem-Daley, had a couple of new applications for me to implement for him.

The first was to use the end of year exam results to award house points to the best pupils in each subject. This would be three, two or one point for the first, second and third best grades in each subject. Also, the pupils coming top in each subject would each receive a discount voucher to use at Mrs Beetsem-Daley's village shop. The pupil getting the highest score in any subject would have their name engraved on the "Beetsem-Daley Shield" displayed in the main hall.

For the second application he was planning a five-a-side football tournament. Over six weeks each house would play the other three houses twice and a league table would be produced showing, for each house

  • House name
  • Games played
  • Games won
  • Games drawn
  • Games lost
  • Goal difference
  • Total Points (3 for a win, 1 for a draw)

As with exam house points, these points would also contribute to the end of the year points total and the "Beetsem-Daley Trophy" awarded to the winning house.

The first thing to do was set up the extra table and Laura Norder, the school factotum, would enter the data.

        CREATE TABLE  fixture (
          `idmatch` int unsigned NOT NULL auto_increment,
          `hometeam` int unsigned NOT NULL,                    -- houseID
          `awayteam` int unsigned NOT NULL,                    -- houseID
          `homegoals` int default NULL,
          `awaygoals` int default NULL,
          `weekno` int unsigned NOT NULL,
          PRIMARY KEY  (`idmatch`)
          KEY `index_2` (`hometeam`),
          KEY `index_3` (`awayteam`)
        )

I had already decided that that the best approach was to do as much of the processing as I could using SQL and any additional code required would just be to display the results of the queries.

Subqueries

These are select queries embedded inside another query. Subqueries are always enclosed in parentheses.

Subqueries returning a single value

The simplest of the above requirements is to find the pupil with the highest score in any subject. We use a subquery to find the highest score and then select the results that match that score

SELECT p.pupil_name
FROM pupil p
    INNER JOIN result r USING (pupilid)
WHERE r.schoolyear=2015
    AND r.pcent = 
    (
    SELECT MAX(pcent) FROM result
    WHERE schoolyear=2015
    )
 

We can also use a subquery to return a boolean value. In my previous tutorial I used a LEFT JOIN to find which subjects were not chosen by any of the pupils. This can also be done by using a subquery to check for the existence of records. When using EXISTS, the SELECT query returns true or false depending on whether records were found or not. It does not matter what is selected, hence the use of "SELECT *" on this occasion.

SELECT s.subject 
FROM subject s
WHERE NOT EXISTS 
    (
    SELECT * FROM choice 
    WHERE subjectid = s.subjectid
    )

NOTE OF CAUTION: While subqueries may appear to be a simple and elegant way of solving SQL problems, this can come come at a cost and they should be used with care. In the example above the subquery is called for every row processed by the outer query. So if you have a table with 100,000 records then you call not one query but 100,001 queries. The result is that your queries can become really sl-o-o-o-o-w. If in doubt you can run an EXPLAIN on the query. If you see "DEPENDENT SUBQUERY" then you should replace the subquery with a JOIN.

Explain:
+----+--------------------+--------+------+---------------+---------+---------+-- . . .
| id | select_type        | table  | type | possible_keys | key     | key_len |
+----+--------------------+--------+------+---------------+---------+---------+-- . . .
|  1 | PRIMARY            | s      | ALL  | NULL          | NULL    | NULL    |
|  2 | DEPENDENT SUBQUERY | choice | ref  | subject       | subject | 4       |
+----+--------------------+--------+------+---------------+---------+---------+-- . . .

Subqueries returning a set of values

You can also use a subquery to return several values. For example, list those pupils who scored over 90% in the 2015 exams

SELECT fname
  , lname
FROM pupil
WHERE pupilid IN 
    (
    SELECT pupilid 
    FROM result
    WHERE schoolyear = 2015 AND pcent > 90
    )
ORDER BY fname

Because IN is used, EXPLAIN on the above query will show it as a dependent subquery. So better to use a JOIN

SELECT DISTINCT
    fname
  , lname
FROM pupil
INNER JOIN 
    result r USING (pupilid)
    WHERE schoolyear = 2015 AND pcent > 90
ORDER BY fname

Table subqueries

Beetsem-Daley wanted to know the pupils coming top in each subject to award the vouchers.

If we had a table storing the top score for each subject each year ...


+------------+-----------+----------+
| schoolyear | subjectid | topscore |
+------------+-----------+----------+
|       2014 |         1 |       93 |
|       2014 |         2 |       93 |
|       2014 |         3 |       88 |
|       2014 |         4 |       92 |
|       2014 |         5 |       94 |
|       2014 |         6 |       87 |
|       2014 |         7 |       92 |
|       2014 |         8 |       93 |
|       2014 |         9 |       91 |
|       2015 |         1 |       95 |
|       2015 |         2 |       85 |
|       2015 |         3 |       96 |
|       2015 |         4 |       94 |
|       2015 |         5 |       95 |
|       2015 |         6 |       93 |
|       2015 |         7 |       96 |
|       2015 |         8 |       95 |
|       2015 |         9 |       94 |
+------------+-----------+----------+

... then we could simply join the result table with this table on "schoolyear, subjectid, pcent" to get the required matching results. We can use a subquery to simulate this table.

Our subquery is

SELECT schoolyear, subjectid, MAX(pcent) as pcent
  FROM result
  GROUP BY schoolyear, subjectid

When we use a "table subquery" it must be given a table alias (in this case I used "X" as the alias)

SELECT p.fname
, p.lname
, r.subjectid
, r.pcent
FROM pupil p
INNER JOIN result r USING (pupilid)
INNER JOIN (
  SELECT schoolyear, subjectid, MAX(pcent) as pcent
  FROM result
  GROUP BY schoolyear, subjectid
  ) as X USING (schoolyear,subjectid,pcent)
WHERE r.schoolyear = 2015

the result is

+----------+----------+-----------+-------+
| fname    | lname    | subjectid | pcent |
+----------+----------+-----------+-------+
| Anna     | Hamilton |         1 |    95 |
| Anne     | Bailey   |         2 |    85 |
| Anthony  | Bell     |         3 |    96 |
| Caroline | Freeman  |         3 |    96 |
| David    | Powell   |         4 |    94 |
| George   | Wilson   |         5 |    95 |
| Wayne    | Jones    |         6 |    93 |
| Peter    | Adamson  |         7 |    96 |
| Anthony  | Bell     |         8 |    95 |
| Mary     | Sheldon  |         9 |    94 |
+----------+----------+-----------+-------+

The final piece, calculating the house points, is a little more complicated. For each pupil we need to find their ranking position within each subject. We then subtract this rank from 4 to get the points. (If rank is first they get 3 points, if second then 2 points and 1 point for third. Other ranking positions are ignored.) We then need to accumulate the points by house. Note that subqueries can be nested inside other subqueries.

The innermost subqueries initialise our variables and sort the records into the order we want to process them, which is pcent descending within each subject

The mid-level subquery process the data to calculate the ranking and points

The outer query summarises the total points for each house

Introducing variables

While the query is processing the input records we need to store values from each record to use when processing the next record. To do this we use SQL user variables:

  • @prevsub
    store the subject so we can compare with next record to see when the subject changes
  • @prevpc
    store the percent so we can compare with next record to see if they scored the same
  • @seq
    store the incremental sequence number starting at one whenever the subject changes
  • @rank
    If the pcent is equal to the previous pcent then the rank is the same as the previous pupil, otherwise it is set to the @seq value

Note that we use := to assign a value to a variable

SELECT  h.house_name, SUM(x.pts) as points
FROM
    (
    SELECT 
        pupilid
      , @seq := IF(@prevsub=subjectid, @seq+1, 1)
      , @rank := IF(pcent=@prevpc, @rank, @seq)+0
      , @prevsub := subjectid as subjectid
      , @prevpc := pcent
      , CASE WHEN @rank > 4 THEN 0 ELSE 4 - @rank END as pts
    FROM 
        (
        SELECT 
            pupilid
          , subjectid
          , pcent
        FROM result
            JOIN (SELECT @prevsub:=0, @prevpc:=0, @rank:=0, @seq:=0, @pts:=0) init
        WHERE schoolyear = 2015
        ORDER BY subjectid, pcent DESC
        ) data
    ) x
    INNER JOIN pupil p USING (pupilid)
    INNER JOIN house h USING (houseid)
GROUP BY house_name
ORDER BY points DESC
;

+------------+--------+
| house_name | points |
+------------+--------+
| Grace      |     17 |
| Laker      |     16 |
| Jardine    |     11 |
| Cowdrey    |     10 |
+------------+--------+

The final task is to process the results of the five-a-side tournament. These are the results of the five-a-side matches which Laura has entered into the fixture table for us over the last six weeks.

+--------+----------+-----------+----------+-----------+
| weekno | hometeam | homegoals | awayteam | awaygoals |
+--------+----------+-----------+----------+-----------+
|      1 | Laker    |         1 | Jardine  |         1 |
|      1 | Cowdrey  |         1 | Grace    |         0 |
|      2 | Grace    |         2 | Cowdrey  |         2 |
|      2 | Jardine  |         1 | Laker    |         3 |
|      3 | Laker    |         2 | Cowdrey  |         4 |
|      3 | Jardine  |         4 | Grace    |         4 |
|      4 | Cowdrey  |         4 | Laker    |         4 |
|      4 | Grace    |         1 | Jardine  |         2 |
|      5 | Cowdrey  |         2 | Jardine  |         0 |
|      5 | Grace    |         0 | Laker    |         3 |
|      6 | Jardine  |         1 | Cowdrey  |         4 |
|      6 | Laker    |         4 | Grace    |         1 |
+--------+----------+-----------+----------+-----------+

This is the query used to get the required league table output. I created a subquery to calculate the required elements for each fixture. A UNION was used as I needed home team results and away team results to be calculated separately and then combined to get the totals. Note the use of UNION ALL. It is quite possible that house 1, say, could win with a goal difference of 1 as a home team and also as an away team. If we just use UNION the duplicate away team record would be ignored.

SELECT h.house_name as House,
    COUNT(*) as Played,
    SUM(won) as Won,
    SUM(drawn) as Drawn,
    SUM(lost) as Lost,
    SUM(gdiff) as GDiff,
    SUM(points) as Points
FROM
(
    SELECT hometeam as houseid,
        CASE WHEN homegoals > awaygoals THEN 1 ELSE 0 END as won,
        CASE WHEN homegoals = awaygoals THEN 1 ELSE 0 END as drawn,
        CASE WHEN homegoals < awaygoals THEN 1 ELSE 0 END as lost,
        homegoals - awaygoals as gdiff,
        CASE WHEN homegoals > awaygoals THEN 3
            WHEN homegoals = awaygoals THEN 1 
            ELSE 0 END as points
    FROM fixture

    UNION ALL
    
    SELECT awayteam as houseid,
        CASE WHEN homegoals < awaygoals THEN 1 ELSE 0 END as won,
        CASE WHEN homegoals = awaygoals THEN 1 ELSE 0 END as drawn,
        CASE WHEN homegoals > awaygoals THEN 1 ELSE 0 END as lost,
        awaygoals - homegoals as gdiff,
        CASE WHEN homegoals < awaygoals THEN 3
            WHEN homegoals = awaygoals THEN 1 
            ELSE 0 END as points
    FROM fixture

) as X
INNER JOIN house h USING (houseid)
GROUP BY h.house_name
ORDER BY Points DESC, GDiff DESC

+---------+--------+------+-------+------+-------+--------+
| House   | Played | Won  | Drawn | Lost | GDiff | Points |
+---------+--------+------+-------+------+-------+--------+
| Cowdrey |      6 |    4 |     2 |    0 |     8 |     14 |
| Laker   |      6 |    3 |     2 |    1 |     6 |     11 |
| Jardine |      6 |    1 |     2 |    3 |    -6 |      5 |
| Grace   |      6 |    0 |     2 |    4 |    -8 |      2 |
+---------+--------+------+-------+------+-------+--------+

Next, how to add data to your tables using INSERT queries