Databases and SQL - Library edition
Combining Data
Learning Objectives
- Explain the operation of a query that joins two tables.
- Explain how to restrict the output of a query containing a join to only include meaningful combinations of values.
- Write queries that join tables on equal keys.
- Explain what primary and foreign keys are, and why they are useful.
So far we haven’t been able to display the authors’ names in our query results, because those names are not in the Works
but in the Authors
table. What’s worse, because there can be many authors to a title and many title associated with each author, there is a third table, Works_Authors
that’s taking care of this many-to-many relationship. How can these tables be joined?
The SQL command to do this is JOIN
. To see how it works, let’s start with the somewhat easier case of the Items
table and try joining it to the Works
table:
SELECT * FROM Items JOIN Works;
Item_ID | Work_ID | Barcode | Acquired | Status | Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 081722942611 | 2009 | Loaned | 1 | SQL in a nutshell | 9780596518844 | 2009 | Sebastopol | O'Reilly | 3rd ed. | 578 |
1 | 1 | 081722942611 | 2009 | Loaned | 2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
1 | 1 | 081722942611 | 2009 | Loaned | 3 | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O'Reilly | 2nd ed. | 532 |
1 | 1 | 081722942611 | 2009 | Loaned | 4 | Using SQLite | 9780596521189 | 2010 | Sebastopol | O'Reilly | 1st ed. | 503 |
1 | 1 | 081722942611 | 2009 | Loaned | 5 | Geek sublime | 9780571310302 | 2014 | London | Faber & Faber | 258 |
The result above was truncated for display because… the query returned a list of 800 records! In fact, JOIN
creates the cross product of two tables, i.e., it joins each record of one table with each record of the other table to give all possible combinations. Since there are 20 records in Works
and 40 in Items
, the join’s output has 20*40=800 records. And since Works
has 8 fields and Items
has 5, the output has 8+5=13 fields.
What the join hasn’t done is figure out if the records being joined have anything to do with each other. It has no way of knowing whether they do or not until we tell it how. To do that, we add a clause specifying that we’re only interested in combinations where `Work_ID matches in both tables:
SELECT * FROM Items JOIN Works ON Items.Work_ID=Works.Work_ID LIMIT 10;
Item_ID | Work_ID | Barcode | Acquired | Status | Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 081722942611 | 2009 | Loaned | 1 | SQL in a nutshell | 9780596518844 | 2009 | Sebastopol | O'Reilly | 3rd ed. | 578 |
2 | 1 | 492437609065 | 2011 | On shelf | 1 | SQL in a nutshell | 9780596518844 | 2009 | Sebastopol | O'Reilly | 3rd ed. | 578 |
3 | 2 | 172480710952 | 2013 | On shelf | 2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
4 | 3 | 708014968732 | 2013 | Missing | 3 | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O'Reilly | 2nd ed. | 532 |
5 | 3 | 819783404942 | 2014 | Loaned | 3 | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O'Reilly | 2nd ed. | 532 |
6 | 4 | 257370237291 | 2010 | Missing | 4 | Using SQLite | 9780596521189 | 2010 | Sebastopol | O'Reilly | 1st ed. | 503 |
7 | 5 | 002905925356 | 2014 | Loaned | 5 | Geek sublime | 9780571310302 | 2014 | London | Faber & Faber | 258 | |
8 | 5 | 964583604781 | 2014 | Loaned | 5 | Geek sublime | 9780571310302 | 2014 | London | Faber & Faber | 258 | |
9 | 6 | 701630524534 | 2014 | Loaned | 6 | Capital in the 21st century | 9780674430006 | 2014 | Cambridge | Belknap Press | 685 | |
10 | 6 | 722040919616 | 2014 | On shelf | 6 | Capital in the 21st century | 9780674430006 | 2014 | Cambridge | Belknap Press | 685 |
ON
is very similar to WHERE
, and for all the queries in this lesson you can use them interchangeably. There are differences in how they affect outer joins, but that’s beyond the scope of this lesson. Once we add this to our query, the database manager throws away records that combined information about two different sites, leaving us with just the ones we want.
Notice that we used Table.field
to specify field names in the output of the join. We do this because tables can have fields with the same name, and we need to be specific which ones we’re talking about.
We can now use the same dotted notation to select the three columns we actually want out of our join:
SELECT Items.Barcode, Works.Title, Works.ISBN FROM Items JOIN Works ON Items.Work_ID=Works.Work_ID LIMIT 10;
Barcode | Title | ISBN |
---|---|---|
081722942611 | SQL in a nutshell | 9780596518844 |
492437609065 | SQL in a nutshell | 9780596518844 |
172480710952 | SQL for dummies | 9781118607961 |
708014968732 | PHP & MySQL | 9781449325572 |
819783404942 | PHP & MySQL | 9781449325572 |
257370237291 | Using SQLite | 9780596521189 |
002905925356 | Geek sublime | 9780571310302 |
964583604781 | Geek sublime | 9780571310302 |
701630524534 | Capital in the 21st century | 9780674430006 |
722040919616 | Capital in the 21st century | 9780674430006 |
We can now try to tackle the case of the Authors
table. To list the contributors associated with the first item on the Works
table (Work_ID=1
, SQL in a nutshell 3rd ed.), we write:
SELECT Works_Authors.Role, Authors.Personal, Authors.Family
FROM Works_Authors
JOIN Authors
ON Authors.Author_ID=Works_Authors.Author_ID
WHERE Works_Authors.Work_ID=1;
Role | Personal | Family |
---|---|---|
Author | Kevin E. | Kline |
Contributor | Daniel | Kline |
Contributor | Brand | Hunt |
Or inversely, if we want to list all the works that Allen G. Taylor (Author_ID=4
) has authored or contributed to, we can write:
SELECT Works.Title, Works.Date, Works.Edition, Works_Authors.Role
FROM Works
JOIN Works_Authors
ON Works.Work_ID=Works_Authors.Work_ID
WHERE Works_Authors.Author_ID=4;
Title | Date | Edition | Role |
---|---|---|---|
SQL for dummies | 2013 | 8th ed. | Author |
SQL for dummies | 2010 | 7th ed. | Author |
SQL all-in-one | 2011 | 2nd ed. | Author |
Access 2013 all-in-one | 2013 | Contributor |
We can tell which records from Works
, Authors
, Items
and Works_Authors
correspond with each other because those tables contain primary keys and foreign keys. A primary key is a value, or combination of values, that uniquely identifies each record in a table. A foreign key is a value (or combination of values) from one table that identifies a unique record in another table. Another way of saying this is that a foreign key is the primary key of one table that appears in some other table. In our database, Works.Work_ID
is the primary key in the Works
table, while Items.Work_ID
is a foreign key relating the Items
table’s entries to entries in Works
. The Authors_Works
table contains only foreign keys relating to entries in the Works
and Authors
tables.
Most database designers believe that every table should have a well-defined primary key. They also believe that this key should be separate from the data itself, so that if we ever need to change the data, we only need to make one change in one place. One easy way to do this is to create an arbitrary, unique ID for each record as we add it to the database. This is actually very common: those IDs have names like “student numbers” and “patient numbers”, and they almost always turn out to have originally been a unique record identifier in some database system or other. As the query below demonstrates, SQLite automatically numbers records as they’re added to tables, and we can use those record numbers in queries:
SELECT rowid, * FROM Person;
Listing Authors whose Name starts with K
Write a query that lists all works written by people whose Family name start with the letter “K”.
Listing Popular Authors
Write a query that lists all authors that have written at least one book that is currently on loan from the library.
Who are the Authors of this Title?
To which item does the barcode 722040919616
refer to, what is the title of this book and who are its authors?