Tutorials

SELECT

Most of your interaction with your databases is going to be the retrieval of stored data. This tutorial is to give you the basics of using SELECT queries.

SELECT queries have many options and keywords associated with them, but apart from JOIN and UNION, which are covered in separate sections, 95% or more of your queries will only need one or more of these.

  • DISTINCT
  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • ORDER BY
  • LIMIT

The various clauses (FROM clause, WHERE clause, GROUP BY clause etc) must, if used, appear in the query in the same order as they appear in the above list.

The SELECT clause

The select clause defines what data is required in the query result. The simplest type of SELECT statement does not even require a database, you can just select a literal value or an expression. EG


mysql> SELECT 'abc', 2+3;
+-----+-----+
| abc | 2+3 |
+-----+-----+
| abc |   5 |
+-----+-----+

When you do select something using an expression rather than just a database column it helps, when you later want to reference the result, if you give the expression a "column alias".

mysql> SELECT 'abc' as name, 2+3 as value;
+------+-------+
| name | value |
+------+-------+
| abc  |     5 |
+------+-------+ 

When it comes to selecting data from the database the minimum keywords required are "SELECT ... FROM ...". In this case the select clause can also contain column names as well as literals or expressions. You can also specify * to denote "all columns". This comes in useful when testing if you need to see what a table contains or, as in the query below, you want to show all columns. This practice should be avoided in production queries. If you or someone else needs to maintain the query some time later, it is much better to have defined exactly which columns are being selected. The lazy use of the asterisk also results in unnecessary data being transferred from the server and so makes the query less efficient.

The FROM clause

This defines the table from which the data is to be selected. For example, select all columns from the result table (This is the data that will be used in the sample queries in this tutorial section)

 
    mysql> SELECT * FROM result;
    +---------+-----------+-------+------------+----------+
    | pupilid | subjectid | pcent | schoolyear | resultid |
    +---------+-----------+-------+------------+----------+
    |       1 |         1 |    78 |       2014 |        1 |
    |       1 |         2 |    87 |       2014 |       13 |
    |       1 |         5 |    79 |       2014 |       37 |
    |       1 |         1 |    73 |       2015 |       82 |
    |       1 |         2 |    74 |       2015 |       94 |
    |       1 |         5 |    84 |       2015 |      118 |
    |       2 |         2 |    53 |       2014 |       14 |
    |       2 |         3 |    88 |       2014 |       24 |
    |       2 |         5 |    76 |       2014 |       38 |
    |       2 |         8 |    71 |       2014 |       62 |
    |       2 |         2 |    81 |       2015 |       95 |
    |       2 |         3 |    82 |       2015 |      105 |
    |       2 |         5 |    77 |       2015 |      119 |
    |       2 |         8 |    78 |       2015 |      143 |
    +---------+-----------+-------+------------+----------+

and an example specifying column names:

    
    mysql> SELECT resultid, pupilid, pcent FROM result;
    +----------+---------+-------+
    | resultid | pupilid | pcent |
    +----------+---------+-------+
    |        1 |       1 |    78 |
    |       13 |       1 |    87 |
    |       37 |       1 |    79 |
    |       82 |       1 |    73 |
    |       94 |       1 |    74 |
    |      118 |       1 |    84 |
    |       14 |       2 |    53 |
    |       24 |       2 |    88 |
    |       38 |       2 |    76 |
    |       62 |       2 |    71 |
    |       95 |       2 |    81 |
    |      105 |       2 |    82 |
    |      119 |       2 |    77 |
    |      143 |       2 |    78 |
    +----------+---------+-------+ 

DISTINCT

DISTINCT is used when you want unique values only. For example
For which years do we have exam results?
If you "SELECT schoolyear FROM result" then you get 14 dates listed with repeated values, however

    mysql> SELECT DISTINCT schoolyear FROM result;
    +------------+
    | schoolyear |
    +------------+
    |       2014 |
    |       2015 |
    +------------+

Note that DISTINCT applies to the whole row and not to individual columns. If you select two columns using DISTINCT then you get unique combinations:

    mysql> SELECT DISTINCT pupilid, schoolyear FROM result;
    +---------+------------+
    | pupilid | schoolyear |
    +---------+------------+
    |       1 |       2014 |
    |       1 |       2015 |
    |       2 |       2014 |
    |       2 |       2015 |
    +---------+------------+

The WHERE clause

This is used to filter the records that are to be used in the query. It should be a valid boolean expression. A record is included if the expression in the WHERE clause evaluates to TRUE.

Examples:

    mysql> SELECT DISTINCT pupilid, schoolyear
        -> FROM result
        -> WHERE pcent > 82;
    +---------+------------+
    | pupilid | schoolyear |
    +---------+------------+
    |       1 |       2014 |
    |       1 |       2015 |
    |       2 |       2014 |
    +---------+------------+


    mysql> SELECT pupilid, subjectid, pcent, schoolyear
        -> FROM result
        -> WHERE schoolyear = 2015 AND subjectid IN (2,5);
    +---------+-----------+-------+------------+
    | pupilid | subjectid | pcent | schoolyear |
    +---------+-----------+-------+------------+
    |       1 |         2 |    74 |       2015 |
    |       1 |         5 |    84 |       2015 |
    |       2 |         2 |    81 |       2015 |
    |       2 |         5 |    77 |       2015 |
    +---------+-----------+-------+------------+

Note that you cannot use column aliases in WHERE clauses. The WHERE clause filters records to be input to the query process whereas aliases are applied after processing but prior to output. The following query (briefly switching to the pupil table) is invalid

    SELECT pupilid
     , SUBSTRING(fname, 1,1) as initial
     , lname as surname
     FROM pupil
     WHERE initial = 'M';
     
     ERROR 1054 (42S22): Unknown column 'initial' in 'where clause'
 

You have to repeat the expression in the WHERE clause:

     mysql> SELECT pupilid
        ->  , SUBSTRING(fname, 1,1) as initial
        ->  , lname as surname
        ->  FROM pupil
        ->  WHERE SUBSTRING(fname, 1,1) = 'M';
    +---------+---------+------------+
    | pupilid | initial | surname    |
    +---------+---------+------------+
    |      16 | M       | Norton     |
    |      17 | M       | Blake      |
    |      18 | M       | Sheldon    |
    |      19 | M       | Whitehouse |
    |      20 | M       | Grove      |
    +---------+---------+------------+ 

The ORDER BY clause

This is a comma-delimited list of output columns, aliases or expressions specifying how the output results should be sorted. Qualifiers ASC or DESC can be added to define if the sort should be lowest values first (ASCending) or highest values first (DESCending). If no qualifier is provided then ASC is used as the default.

So "ORDER BY schoolyear ASC, pcent DESC" would sort the results by ascending year. Within each year the results would be sorted by descending values of pcent.

    mysql> SELECT schoolyear as year, pupilid, pcent
        -> FROM result
        -> WHERE subjectid IN (2, 5)
        -> ORDER BY year ASC, pcent DESC;
    +------+---------+-------+
    | year | pupilid | pcent |
    +------+---------+-------+
    | 2014 |       1 |    87 |
    | 2014 |       1 |    79 |
    | 2014 |       2 |    76 |
    | 2014 |       2 |    53 |
    | 2015 |       1 |    84 |
    | 2015 |       2 |    81 |
    | 2015 |       2 |    77 |
    | 2015 |       1 |    74 |
    +------+---------+-------+

Boolean expressions can be useful in some sorting situations. These evaluate to 1 if the expression is true, or 0 if the expression is false

List results in descending order of pcent but show results for subject 5 first

    SELECT schoolyear as year, pupilid, subjectid, pcent
    FROM results
    ORDER BY subjectid=5 DESC, pcent DESC;
    +------+---------+-----------+-------+
    | year | pupilid | subjectid | pcent |
    +------+---------+-----------+-------+
    | 2015 |       1 |         5 |    84 |
    | 2014 |       1 |         5 |    79 |
    | 2015 |       2 |         5 |    77 |
    | 2014 |       2 |         5 |    76 |
    | 2014 |       2 |         3 |    88 |
    | 2014 |       1 |         2 |    87 |
    | 2015 |       2 |         3 |    82 |
    | 2015 |       2 |         2 |    81 |
    | 2014 |       1 |         1 |    78 |
    | 2015 |       2 |         8 |    78 |
    | 2015 |       1 |         2 |    74 |
    | 2015 |       1 |         1 |    73 |
    | 2014 |       2 |         8 |    71 |
    | 2014 |       2 |         2 |    53 |
    +------+---------+-----------+-------+

The LIMIT clause

As the name suggests, this limits the number of rows output by the query

What were the top three scores in 2014?

    mysql> SELECT pupilid, subjectid, pcent
        -> FROM result
        -> WHERE schoolyear=2014
        -> ORDER BY pcent DESC
        -> LIMIT 3;
    +---------+-----------+-------+
    | pupilid | subjectid | pcent |
    +---------+-----------+-------+
    |       2 |         3 |    88 |
    |       1 |         2 |    87 |
    |       1 |         5 |    79 |
    +---------+-----------+-------+

The second common use for the LIMIT clause is "pagination" where you want to display a fixed number of records per page and spread the results over several pages. When used for pagination the LIMIT clause has two values, the offset of the start record and the number of records. So if we want to display our result data as five records per page we would have three queries with
LIMIT 0,5
LIMIT 5,5
LIMIT 10,5
The offset is calculated as (page number - 1) x records per page.

The GROUP BY clause

Use a GROUP BY clause in conjunction with one or more of the many aggregation functions that SQL provides. The most common of these functions are

  • AVG()
  • COUNT()
  • GROUP_CONCAT()
  • MIN()
  • MAX()
  • SUM()

How many subjects were taken by each pupil?

    mysql> SELECT pupilid
        ->     , COUNT(DISTINCT subjectid) as subjects
        -> FROM result
        -> GROUP BY pupilid;
    +---------+----------+
    | pupilid | subjects |
    +---------+----------+
    |       1 |        3 |
    |       2 |        4 |
    +---------+----------+

Note that COUNT(DISTINCT) is a variation of COUNT() that returns a count of the different values. Do not confuse with the the SELECT DISTINCT syntax above which applies to the entire row.

Which subjects were taken by each pupil each year and what were the pupils' highest and lowest marks?

    mysql> SELECT schoolyear
        ->     , pupilid
        ->     , GROUP_CONCAT(subjectid ORDER BY subjectid) as subjects
        ->     , MIN(pcent) as low_score
        ->     , MAX(pcent) as high_score
        -> FROM result
        -> GROUP BY schoolyear, pupilid;
    +------------+---------+----------+-----------+------------+
    | schoolyear | pupilid | subjects | low_score | high_score |
    +------------+---------+----------+-----------+------------+
    |       2014 |       1 | 1,2,5    |        78 |         87 |
    |       2014 |       2 | 2,3,5,8  |        53 |         88 |
    |       2015 |       1 | 1,2,5    |        73 |         84 |
    |       2015 |       2 | 2,3,5,8  |        77 |         82 |
    +------------+---------+----------+-----------+------------+

As with SELECT DISTINCT, you get a single row for each value (or combination) in the GROUP BY fields.

Note: standard SQL will not allow you to select extra columns that not used in the GROUP BY clause or are not the subject of an aggregation. MySQL will allow you but do it with caution. If several records contain the same group by value you cannot be certain which of those records the extra columns will come from.

The HAVING clause

There are times when you want to output aggregated rows under certain conditions, say, when the MAX is over a certain value or the COUNT is greater than 1 (duplicates exist). You cannot use a WHERE clause because that filters the inputs and the totals are not known at that point. The solution is to use HAVING to filter the output.

Like a WHERE clause it must be a valid boolean expression. Example:

mysql> SELECT schoolyear
    ->     , pupilid
    ->     , GROUP_CONCAT(subjectid ORDER BY subjectid) as subjects
    ->     , MIN(pcent) as low_score
    ->     , MAX(pcent) as high_score
    -> FROM result
    -> GROUP BY schoolyear, pupilid
    -> HAVING high_score > 85;
+------------+---------+----------+-----------+------------+
| schoolyear | pupilid | subjects | low_score | high_score |
+------------+---------+----------+-----------+------------+
|       2014 |       1 | 1,2,5    |        78 |         87 |
|       2014 |       2 | 2,3,5,8  |        53 |         88 |
+------------+---------+----------+-----------+------------+

That was the basic anatomy of SELECT statements.

The next section will show you how to create queries using multiple tables with JOINS.