Databases and SQL - Library edition

Combining Data

Learning Objectives

  • Explain the operation of a query that joins two tables.
  • Explain how to restrict the output of a query containing a join to only include meaningful combinations of values.
  • Write queries that join tables on equal keys.
  • Explain what primary and foreign keys are, and why they are useful.

So far we haven’t been able to display the authors’ names in our query results, because those names are not in the Works but in the Authors table. What’s worse, because there can be many authors to a title and many title associated with each author, there is a third table, Works_Authorsthat’s taking care of this many-to-many relationship. How can these tables be joined?

The SQL command to do this is JOIN. To see how it works, let’s start with the somewhat easier case of the Items table and try joining it to the Works table:

SELECT * FROM Items JOIN Works;
Item_ID Work_ID Barcode Acquired Status Work_ID Title ISBN Date Place Publisher Edition Pages
1 1 081722942611 2009 Loaned 1 SQL in a nutshell 9780596518844 2009 Sebastopol O'Reilly 3rd ed. 578
1 1 081722942611 2009 Loaned 2 SQL for dummies 9781118607961 2013 Hoboken Wiley 8th ed.
1 1 081722942611 2009 Loaned 3 PHP & MySQL 9781449325572 2013 Sebastopol O'Reilly 2nd ed. 532
1 1 081722942611 2009 Loaned 4 Using SQLite 9780596521189 2010 Sebastopol O'Reilly 1st ed. 503
1 1 081722942611 2009 Loaned 5 Geek sublime 9780571310302 2014 London Faber & Faber 258

The result above was truncated for display because… the query returned a list of 800 records! In fact, JOIN creates the cross product of two tables, i.e., it joins each record of one table with each record of the other table to give all possible combinations. Since there are 20 records in Works and 40 in Items, the join’s output has 20*40=800 records. And since Works has 8 fields and Items has 5, the output has 8+5=13 fields.

What the join hasn’t done is figure out if the records being joined have anything to do with each other. It has no way of knowing whether they do or not until we tell it how. To do that, we add a clause specifying that we’re only interested in combinations where `Work_ID matches in both tables:

SELECT * FROM Items JOIN Works ON Items.Work_ID=Works.Work_ID LIMIT 10;
Item_ID Work_ID Barcode Acquired Status Work_ID Title ISBN Date Place Publisher Edition Pages
1 1 081722942611 2009 Loaned 1 SQL in a nutshell 9780596518844 2009 Sebastopol O'Reilly 3rd ed. 578
2 1 492437609065 2011 On shelf 1 SQL in a nutshell 9780596518844 2009 Sebastopol O'Reilly 3rd ed. 578
3 2 172480710952 2013 On shelf 2 SQL for dummies 9781118607961 2013 Hoboken Wiley 8th ed.
4 3 708014968732 2013 Missing 3 PHP & MySQL 9781449325572 2013 Sebastopol O'Reilly 2nd ed. 532
5 3 819783404942 2014 Loaned 3 PHP & MySQL 9781449325572 2013 Sebastopol O'Reilly 2nd ed. 532
6 4 257370237291 2010 Missing 4 Using SQLite 9780596521189 2010 Sebastopol O'Reilly 1st ed. 503
7 5 002905925356 2014 Loaned 5 Geek sublime 9780571310302 2014 London Faber & Faber 258
8 5 964583604781 2014 Loaned 5 Geek sublime 9780571310302 2014 London Faber & Faber 258
9 6 701630524534 2014 Loaned 6 Capital in the 21st century 9780674430006 2014 Cambridge Belknap Press 685
10 6 722040919616 2014 On shelf 6 Capital in the 21st century 9780674430006 2014 Cambridge Belknap Press 685

ON is very similar to WHERE, and for all the queries in this lesson you can use them interchangeably. There are differences in how they affect outer joins, but that’s beyond the scope of this lesson. Once we add this to our query, the database manager throws away records that combined information about two different sites, leaving us with just the ones we want.

Notice that we used Table.field to specify field names in the output of the join. We do this because tables can have fields with the same name, and we need to be specific which ones we’re talking about.

We can now use the same dotted notation to select the three columns we actually want out of our join:

SELECT Items.Barcode, Works.Title, Works.ISBN FROM Items JOIN Works ON Items.Work_ID=Works.Work_ID LIMIT 10;
Barcode Title ISBN
081722942611 SQL in a nutshell 9780596518844
492437609065 SQL in a nutshell 9780596518844
172480710952 SQL for dummies 9781118607961
708014968732 PHP & MySQL 9781449325572
819783404942 PHP & MySQL 9781449325572
257370237291 Using SQLite 9780596521189
002905925356 Geek sublime 9780571310302
964583604781 Geek sublime 9780571310302
701630524534 Capital in the 21st century 9780674430006
722040919616 Capital in the 21st century 9780674430006

We can now try to tackle the case of the Authors table. To list the contributors associated with the first item on the Works table (Work_ID=1, SQL in a nutshell 3rd ed.), we write:

SELECT Works_Authors.Role, Authors.Personal, Authors.Family 
FROM   Works_Authors 
JOIN   Authors 
ON     Authors.Author_ID=Works_Authors.Author_ID 
WHERE  Works_Authors.Work_ID=1;
Role Personal Family
Author Kevin E. Kline
Contributor Daniel Kline
Contributor Brand Hunt

Or inversely, if we want to list all the works that Allen G. Taylor (Author_ID=4) has authored or contributed to, we can write:

SELECT Works.Title, Works.Date, Works.Edition, Works_Authors.Role 
FROM   Works 
JOIN   Works_Authors 
ON     Works.Work_ID=Works_Authors.Work_ID 
WHERE  Works_Authors.Author_ID=4;
Title Date Edition Role
SQL for dummies 2013 8th ed. Author
SQL for dummies 2010 7th ed. Author
SQL all-in-one 2011 2nd ed. Author
Access 2013 all-in-one 2013 Contributor

We can tell which records from Works, Authors, Items and Works_Authors correspond with each other because those tables contain primary keys and foreign keys. A primary key is a value, or combination of values, that uniquely identifies each record in a table. A foreign key is a value (or combination of values) from one table that identifies a unique record in another table. Another way of saying this is that a foreign key is the primary key of one table that appears in some other table. In our database, Works.Work_ID is the primary key in the Works table, while Items.Work_ID is a foreign key relating the Items table’s entries to entries in Works. The Authors_Works table contains only foreign keys relating to entries in the Works and Authors tables.

Most database designers believe that every table should have a well-defined primary key. They also believe that this key should be separate from the data itself, so that if we ever need to change the data, we only need to make one change in one place. One easy way to do this is to create an arbitrary, unique ID for each record as we add it to the database. This is actually very common: those IDs have names like “student numbers” and “patient numbers”, and they almost always turn out to have originally been a unique record identifier in some database system or other. As the query below demonstrates, SQLite automatically numbers records as they’re added to tables, and we can use those record numbers in queries:

SELECT rowid, * FROM Person;

Listing Authors whose Name starts with K

Write a query that lists all works written by people whose Family name start with the letter “K”.

Write a query that lists all authors that have written at least one book that is currently on loan from the library.

Who are the Authors of this Title?

To which item does the barcode 722040919616 refer to, what is the title of this book and who are its authors?