Databases and SQL - Library edition

Calculating New Values

Learning Objectives

  • Write queries that calculate new values for each selected record.

The library is embarking on a digitization project. In order to estimate how long it would take to scan books, we want to look up what the total number of pages for all library books are. Prior experience has shown that due to poor quality binding, up to 2% of the pages will fail to be scanned for books published by Wiley and will have to be scanned anew. The total number of pages has therefore to be adjusted accordingly for this publisher.

Rather than modifying the stored data, we can do this calculation on the fly as part of our query:

SELECT 1.02*Pages FROM Works WHERE Publisher = "Wiley";
1.02*Pages
-null-
408.0
874.14
448.8
511.02
722.16
775.2

Remember those empty cells we had in the database? Seems like one of them is showing up again here. We’ll have to deal with it later so that it doesn’t mess up our computations.

When we run the query, the expression 1.02 * Pages is evaluated for each row. Expressions can use any of the fields, all of usual arithmetic operators, and a variety of common functions. (Exactly which ones depends on which database manager is being used.) For example, knowing that scanning two pages takes exactly 13 seconds, we can compute how many hours would be necessary to scan each Wiley book. Since this is an estimate, more than one decimal place doesn’t make much sense and so we use a rounding function:

SELECT Title, round(13/2*(1.02*Pages)/3600,1) FROM Works WHERE Publisher = "Wiley";
Title round(13/2*(1.02*Pages)/3600,1)
SQL for dummies
Discovering SQL 0.7
SQL bible 1.5
SQL for dummies 0.7
Beginning SQL 0.9
SQL all-in-one 1.2
Access 2013 all-in-one 1.3

We can also combine values from different fields, for example by using the string concatenation operator ||:

SELECT Personal || " " || Family FROM Authors LIMIT 5;
Personal || " " || Family
Kevin E. Kline
Daniel Kline
Brand Hunt
Allen G. Taylor
Brett McLaughlin

Computing Scan Time for O’Reilly Titles

After further testing, we realize that the average time to scan every two pages of books published by O’Reilly is 25 seconds, because the operator seems to like reading a few paragraphs now and then. Fortunately, they do not suffer from the binding issue, though. Write a query that returns the number of hours necessary to scan each O’Reilly book, given those circumstances.

Unions

The UNION operator combines the results of two queries:

SELECT * FROM Works WHERE Publisher='Peachpit' UNION SELECT * FROM Works WHERE Publisher='Faber & Faber';
Work_ID Title ISBN Date Place Publisher Edition Pages
5 Geek sublime 9780571310302 2014 London Faber & Faber 258
9 SQL 0321334175 2005 Berkeley Peachpit 2nd ed. 460

Use UNION to create a consolidated list of Wiley and O’Reilly titles, along with the estimated time it would take to digitize them according to the two formulas discussed above. The output should be something like:

Access 2013 all-in-one 1.3
Beginning SQL 0.9
Discovering SQL 0.7
Learning SQL 1.0
MySQL in a nutshell 1.0
PHP & MySQL 1.0
SQL all-in-one 1.2
SQL bible 1.5
SQL for dummies
SQL for dummies 0.7
SQL in a nutshell 1.0
SQL in a nutshell 2.0
Using SQLite 1.0

Using ISBNs for Collection Analysis

The Works table contains ISBN numbers that are either in 10-digit or 13-digit format. The 13-digit numbers all start with the ‘978’ prefix. The two digits following the prefix (or the first two digits in ISBN-10) form the “registration group element” (to simplify: language or country), the next four digits are the “registrant element” (publisher) and the next three are the “publication element” (title). The last digit in both formats is a checksum character.

ISBN Details.svg
ISBN Details” by Sakurambo at English Wikipedia - Own work, based on en::Image:ISBN Details.jpg. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

The substring function substr(X, I, L) returns the substring of X starting at index I and of length L (L is optional). The length() function can be used in a WHERE clause to test against the length of a field. Use these functions and the UNION statement to output a list of publishers and the 4-digit publisher codes (registrant elements) appearing in the Works table. The result should help dissolve whatever hope you still held on the usefulness of ISBNs for collection analysis…