Databases and SQL - Library edition
Calculating New Values
Learning Objectives
- Write queries that calculate new values for each selected record.
The library is embarking on a digitization project. In order to estimate how long it would take to scan books, we want to look up what the total number of pages for all library books are. Prior experience has shown that due to poor quality binding, up to 2% of the pages will fail to be scanned for books published by Wiley and will have to be scanned anew. The total number of pages has therefore to be adjusted accordingly for this publisher.
Rather than modifying the stored data, we can do this calculation on the fly as part of our query:
SELECT 1.02*Pages FROM Works WHERE Publisher = "Wiley";
1.02*Pages |
---|
-null- |
408.0 |
874.14 |
448.8 |
511.02 |
722.16 |
775.2 |
Remember those empty cells we had in the database? Seems like one of them is showing up again here. We’ll have to deal with it later so that it doesn’t mess up our computations.
When we run the query, the expression 1.02 * Pages
is evaluated for each row. Expressions can use any of the fields, all of usual arithmetic operators, and a variety of common functions. (Exactly which ones depends on which database manager is being used.) For example, knowing that scanning two pages takes exactly 13 seconds, we can compute how many hours would be necessary to scan each Wiley book. Since this is an estimate, more than one decimal place doesn’t make much sense and so we use a rounding function:
SELECT Title, round(13/2*(1.02*Pages)/3600,1) FROM Works WHERE Publisher = "Wiley";
Title | round(13/2*(1.02*Pages)/3600,1) |
---|---|
SQL for dummies | |
Discovering SQL | 0.7 |
SQL bible | 1.5 |
SQL for dummies | 0.7 |
Beginning SQL | 0.9 |
SQL all-in-one | 1.2 |
Access 2013 all-in-one | 1.3 |
We can also combine values from different fields, for example by using the string concatenation operator ||
:
SELECT Personal || " " || Family FROM Authors LIMIT 5;
Personal || " " || Family |
---|
Kevin E. Kline |
Daniel Kline |
Brand Hunt |
Allen G. Taylor |
Brett McLaughlin |
Computing Scan Time for O’Reilly Titles
After further testing, we realize that the average time to scan every two pages of books published by O’Reilly is 25 seconds, because the operator seems to like reading a few paragraphs now and then. Fortunately, they do not suffer from the binding issue, though. Write a query that returns the number of hours necessary to scan each O’Reilly book, given those circumstances.
Unions
The UNION
operator combines the results of two queries:
SELECT * FROM Works WHERE Publisher='Peachpit' UNION SELECT * FROM Works WHERE Publisher='Faber & Faber';
Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|
5 | Geek sublime | 9780571310302 | 2014 | London | Faber & Faber | 258 | |
9 | SQL | 0321334175 | 2005 | Berkeley | Peachpit | 2nd ed. | 460 |
Use UNION
to create a consolidated list of Wiley and O’Reilly titles, along with the estimated time it would take to digitize them according to the two formulas discussed above. The output should be something like:
Access 2013 all-in-one | 1.3 |
Beginning SQL | 0.9 |
Discovering SQL | 0.7 |
Learning SQL | 1.0 |
MySQL in a nutshell | 1.0 |
PHP & MySQL | 1.0 |
SQL all-in-one | 1.2 |
SQL bible | 1.5 |
SQL for dummies | |
SQL for dummies | 0.7 |
SQL in a nutshell | 1.0 |
SQL in a nutshell | 2.0 |
Using SQLite | 1.0 |
Using ISBNs for Collection Analysis
The Works
table contains ISBN numbers that are either in 10-digit or 13-digit format. The 13-digit numbers all start with the ‘978’ prefix. The two digits following the prefix (or the first two digits in ISBN-10) form the “registration group element” (to simplify: language or country), the next four digits are the “registrant element” (publisher) and the next three are the “publication element” (title). The last digit in both formats is a checksum character.
“ISBN Details” by Sakurambo at English Wikipedia - Own work, based on en::Image:ISBN Details.jpg. Licensed under CC BY-SA 3.0 via Wikimedia Commons.
The substring function substr(X, I, L)
returns the substring of X starting at index I and of length L (L is optional). The length()
function can be used in a WHERE
clause to test against the length of a field. Use these functions and the UNION
statement to output a list of publishers and the 4-digit publisher codes (registrant elements) appearing in the Works
table. The result should help dissolve whatever hope you still held on the usefulness of ISBNs for collection analysis…