Databases and SQL - Library edition
Data Hygiene
Learning Objectives
- Explain what an atomic value is.
- Distinguish between atomic and non-atomic values.
- Explain why every value in a database should be atomic.
- Explain what a primary key is and why every record should have one.
- Identify primary keys in database tables.
- Explain why database entries should not contain redundant information.
- Identify redundant information in databases.
Now that we have seen how joins work, we can see why the relational model is so useful and how best to use it. The first rule is that every value should be atomic, i.e., not contain parts that we might want to work with separately. We store personal and family names in separate columns instead of putting the entire name in one column so that we don’t have to use substring operations to get the name’s components. More importantly, we store the two parts of the name separately because splitting on spaces is unreliable: just think of a name like “Eloise St. Cyr” or “Jan Mikkel Steubart”.
The second rule is that every record should have a unique primary key. This can be a serial number that has no intrinsic meaning (like the work_ID
field in the Works
table), one of the values in the record (the barcode
field in the Items
table could have been used instead of the item_ID
), or even a combination of values.
The third rule is that there should be no redundant information. For example, we could have a single table combining the data from the Items
and the Works
tables, like this:
Item_ID | Barcode | Acquired | Status | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|---|---|---|
1 | 081722942611 | 2009 | Loaned | SQL in a nutshell | 9780596518844 | 2009 | Sebastopol | O’Reilly | 3rd ed. | 578 |
2 | 492437609065 | 2011 | On shelf | SQL in a nutshell | 9780596518844 | 2009 | Sebastopol | O’Reilly | 3rd ed. | 578 |
3 | 172480710952 | 2013 | On shelf | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
4 | 708014968732 | 2013 | Missing | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O’Reilly | 2nd ed. | 532 |
5 | 819783404942 | 2014 | Loaned | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O’Reilly | 2nd ed. | 532 |
In fact, we could use a single table that recorded all the information about each item in each row, just as a spreadsheet would, including the author information in individual columns. The problem is that it’s very hard to keep data organized this way consistent: if we realize that the bibliographic information for a particular title is wrong, we have to change all the item records for this title. Storing the author/contributor information would pose another challenge still: our spreasheet would need to have as many columns to store this information so as to fit the records with the greatest number of contributors in our database. Those columns would mostly be empty for records that have a smaller number of contributors. What’s worse, if we added a title with an even greater number of contributors to the database, we would need to alter our table to add even more columns to it.
The fourth rule is that the units for every value should be stored explicitly. If alongside the number of pages we were also recording the size of each item in our library, we would need to specify whether that size is expressed in inches or centimeters, for example.
Stepping back, data and the tools used to store it have a symbiotic relationship: we use tables and joins because it’s efficient, provided our data is organized a certain way, but organize our data that way because we have tools to manipulate it efficiently. As anthropologists say, the tool shapes the hand that shapes the tool.
Identifying Atomic Values
Which of the following are atomic values? Which are not? Why?
- New Zealand
- 87 Turing Avenue
- January 25, 1971
- the XY coordinate (0.5, 3.3)
Identifying a Primary Key
What is the primary key in this table? I.e., what value or combination of values uniquely identifies a record?
Work_ID | Title | ISBN | Date | Place | Publisher |
---|---|---|---|---|---|
16 | Microsoft SQL server 2012 | 9780132977661 | 2013 | Indianapolis | Sams |