SQL INTRODUCTION
In these SQL tutorials we will first look at retrieving data from databases with basic SELECT queries and then how to improve the database queries using joins, unions and sub-queries. We will also take a short excursion into the realm of SQL user variables. We will then take an look at INSERT, UPDATE and DELETE statements.
Before we get to those, however, we'll first put our data into a format where we can take full advantage of the power of SQL joins. This process of removing redundant duplication and repetition of data is called "normalization". I am not covering it in any detail here but I will give you a rough-and-ready method of handling data relationships. (Google "data normalization" so you'll know exactly what you are doing. In practice you'll use first, second and third normal forms. Fourth and beyond are best reserved for theoretical classroom exercises.)
These tutorials are an introduction to using SQL and primarily written for MySQL. Some adjustment to the queries may be required for other dialects of SQL. For more details on each of the statement types refer to the MySQL manual
The Scenario
Throughout this tutorial we will use a fictitious school database. The school's headmaster, Mr Beetsem-Daley, has commissioned us to build an intranet for pupil administration. He requires us to record which pupils are in each class and house and the subjects chosen for study by each pupil. He also wants us to record the pupils' annual exam results and the teaching staff in each department. His data is currently held in spreadsheets, one of which (his pupil data) looks like this:
+------------------+---------+---------+------------------+------------------------+ | Pupil | Class | House | Housemaster | ChosenSubjects | +------------------+---------+---------+------------------+------------------------+ | Adam Simms | Class A | Laker | Frank Morrisson | Econ, Engl, Germ | | Allan Blair | Class B | Grace | Charles Hadleigh | Econ, Geog, Germ, Phys | | Anna Hamilton | Class B | Grace | Charles Hadleigh | Engl, Hist | | Anne Bailey | Class D | Jardine | Pauline Fforbes | Engl, Germ, Hist, Phys | | Anthony Bell | Class E | Cowdrey | Robert Bingley | Econ, Geog, Hist, Phys | | Caroline Freeman | Class F | Grace | Charles Hadleigh | Biol, Chem, Geog, Hist | | David Powell | Class A | Laker | Frank Morrisson | Econ, Math, Phys | | Emma Watson | Class C | Cowdrey | Robert Bingley | Engl, Geog | | Gearge Wilson | Class C | Laker | Frank Morrisson | Chem, Econ, Engl, Hist | | Henry Irving | Class D | Cowdrey | Robert Bingley | Biol, Chem, Math | | Jane Morrison | Class E | Laker | Frank Morrisson | Econ, Engl, Phys | | John Patterson | Class F | Jardine | Pauline Fforbes | Geog, Germ | | John Tully | Class A | Jardine | Pauline Fforbes | Biol, Engl, Hist, Math | | John Watson | Class B | Grace | Charles Hadleigh | Chem, Econ, Engl, Germ | | John Williams | Class C | Grace | Charles Hadleigh | Engl, Germ, Hist | | Margaret Norton | Class D | Cowdrey | Robert Bingley | Germ, Phys | | Mary Blake | Class E | Cowdrey | Robert Bingley | Germ, Hist, Phys | | Mary Sheldon | Class F | Jardine | Pauline Fforbes | Chem, Econ, Engl, Hist | | Mary Whitehouse | Class A | Grace | Charles Hadleigh | Engl, Geog, Hist, Math | | Michael Grove | Class B | Jardine | Pauline Fforbes | Econ, Engl, Phys | | Peter Adamson | Class C | Laker | Frank Morrisson | Chem, Econ, Germ, Phys | | Peter Appleby | Class D | Jardine | Pauline Fforbes | Biol, Econ, Germ, Math | | Wayne Jones | Class E | Laker | Frank Morrisson | Biol, Econ, Germ, Math | | William Smith | Class F | Cowdrey | Robert Bingley | Biol, Econ, Math, Phys | +------------------+---------+---------+------------------+------------------------+
We notice the repetition of the class, house and house-master names and the totally unsuitable comma-separated lists. Those repeating subject names are the prime candidate for normalization. Also the housemaster is a property of house and does not belong in the pupil data. Names, like the class, house and teacher names should be stored in only one location in the database, not duplicated over many records and tables.The only items that appear in multiple tables should be key values, such as classid, pupilid etc.
The Data
From the spreadsheets and the brief we were given, we have identified the main items of data (entities), namely pupil, house, class, teacher, subject, result and department.
The Data Relationships
We look at each pair of entities and consider how they are related.
For example
- class - pupil
One class has many pupils, one pupil belongs to one class. So this is a simple one-to-many relationship. To implement the relationship we store the classid in each pupil record as a "foreign key".+--------------+ +--------------+ | class | | pupil | +--------------+ +--------------+ | classid |-----+ | pupilid | | name | | | name | +--------------+ +------<| classid | +--------------+
- pupil - subject
One pupil takes many subjects, one subject is taken by many pupils. This is a many-to-many relationship. (A subject can actually be taken by none, one or many pupils which, as we will see later, may affect how we join the tables.) To implement this we need to add an extra table (which we'll call "choice") to store a record for each subject taken by each pupil containing the pupilid and the subjectid+--------------+ +--------------+ +--------------+ | subject | | choice | | pupil | +--------------+ +--------------+ +--------------+ | subjectid |-----+ | pupilid |>------------| pupilid | | name | +-----0<| subjectid | | name | +--------------+ +--------------+ | classid | +--------------+
Our many-to-many has now become two one-to-many relationships. In the spreadsheet we could only see which subjects were chosen by a pupil. Now, by searching in the opposite direction, we can easily find which pupils chose each subject. By normalizing this piece of information we have now doubled its functionality.
A matrix like this one below makes it easy to record our relationships between each pair. Note that our teachers report to their heads of departments, so there is a relationship between teacher and "self".
After weeks of intensive data analysis and data modelling we present our redesigned database to the headmaster. (OK, it took a few minutes but if you were doing this on a corporate scale it is going to take that time, identifying all the processes and data elements and how they relate to one another. Get this bit right and life is a lot easier down the line.)
The Database Schema
Our final design, showing the entity relationships, looks like this
Now we can start querying our database using SELECT queries.