Databases and SQL - Library edition

Creating and Modifying Data

Learning Objectives

  • Write statements that creates tables.
  • Write statements to insert, modify, and delete records.

So far we have only looked at how to get information out of a database, both because that is more frequent than adding information, and because most other operations only make sense once queries are understood. If we want to create and modify data, we need to know two other sets of commands.

The first pair are CREATE TABLE and DROP TABLE. While they are written as two words, they are actually single commands. The first one creates a new table; its arguments are the names and types of the table’s columns. For example, the following statements create the four tables in our library database:

CREATE TABLE Authors(Author_ID integer, Family text, Personal text, Occupation text, Birth integer, Death integer);
CREATE TABLE Items(Item_ID integer, Work_ID integer, Barcode text, Acquired integer, Status text);
CREATE TABLE Works(Work_ID integer, Title text, ISBN text, Date integer, Place text, Publisher text, Edition text, Pages integer);
CREATE TABLE Works_Authors(Work_ID integer, Author_ID integer, Role text);

We can get rid of one of our tables using:

DROP TABLE Items;

Be very careful when doing this: most databases have some support for undoing changes, but it’s better not to have to rely on it.

Different database systems support different data types for table columns, but most provide the following:

data type use
INTEGER a signed integer
REAL a floating point number
TEXT a character string
BLOB a “binary large object”, such as an image

Most databases also support Booleans and date/time values; SQLite uses the integers 0 and 1 for the former, and represents the latter as discussed earlier. An increasing number of databases also support geographic data types, such as latitude and longitude. Keeping track of what particular systems do or do not offer, and what names they give different data types, is an unending portability headache.

When we create a table, we can specify several kinds of constraints on its columns. For example, a better definition for the Items table would be:

CREATE TABLE Items(
    Item_ID integer not null,   -- all items should have an ID 
    Work_ID integer not null,   -- all items should be linked to a title
    Barcode text,               -- all other fields can be NULL
    Acquired integer, 
    Status text,
    primary key(Item_ID),
    foreign key(Work_ID) references Works(Work_ID)
);

Once again, exactly what constraints are available and what they’re called depends on which database manager we are using.

Once tables have been created, we can add, change, and remove records using our other set of commands, INSERT, UPDATE, and DELETE.

The simplest form of INSERT statement lists values in order:

INSERT INTO authors VALUES(25,'Shakespeare', 'William', 'Playwright', NULL, 1616);

If we are unsure in which order the columns were defined when the table was created, we can also specify in which column each value should be stored:

INSERT INTO authors(Family, Personal, Birth) VALUES('Atwood', 'Margaret', 1939);

We can also insert values into one table directly from another:

CREATE TABLE Author_names(family text, personal text);
INSERT INTO Author_names SELECT family, personal FROM Authors;

Modifying existing records is done using the UPDATE statement. To do this we tell the database which table we want to update, what we want to change the values to for any or all of the columns, and under what conditions we should update the values.

For example, if we wanted to update one of the records from the Works table where the publication date is missing, we would write

UPDATE Works SET Date=2006 WHERE Work_ID=20;

While we are at it, we could also have added other information that was missing, by editing multiple columns in one statement:

UPDATE Works SET Date=2005, Edition='1st' WHERE Work_ID=20;

Be careful to not forget the WHERE clause or the UPDATE statements will modify all of the records in the database!

Deleting records can be a bit trickier, because we have to ensure that the database remains internally consistent. If all we care about is a single table, we can use the DELETE command with a WHERE clause that matches the records we want to discard. For example, once we realize that our library doesn’t include titles written by Margaret Atwood, we could remove her from the Authors table like this:

DELETE FROM Authors WHERE family='Atwood';

But what if we removed another author, such as Kevin E. Kline (Author_ID = 1) instead? Our Works table would still contain three records of titles to which he is a contributor. Remember, the relationship between Works and Authors is stored in the Works_Authors table:

SELECT * FROM Works_Authors WHERE Author_ID = 1;
Work_ID Author_ID Role
1 1 Author
16 1 Contributor
19 1 Author

This problem is called referential integrity: we need to ensure that all references between tables can always be resolved correctly. If we are weeding our collection and really want to delete all records of titles authored by Kevin Kline, we would not only need to remove him from the Authors table, but also remove the rows in the Works_Authors table that refer to him, as well as remove all the works that he authored from the Works table.

If our database manager supports it, we could automate this using cascading delete. However, this technique is outside the scope of this chapter.

Also, it is often better to identify records as “deleted” without actually deleting them. In our example, instead of deleting all the information pertaining to the David Kline titles, we could change the Status column in the corresponding rows in the Items table to e.g. Deleted.

Extracting Publisher Data

Instead of storing the Publisher information in the Works table, we could create an authority file for publishers by storing this information in a separate table, then linking it to the Works table, much like the authors are managed.

The first step would be to retrieve the existing information on publishers from the Works table. Create a table that contains only publisher names from the Works table. Warning: avoid duplicates!

Replacing a NULL value

The number of pages for ‘SQL for dummies’, 8th edition is 480. Update the Works table accordingly.

Replacing all NULL values

In the Authors table, NULL can either mean that a value was not documented (we don’t know this information), or that an author is contemporary (the death column is empty). Write a statement that replaces the NULL value in the column death with the string ‘ACTIVE’ for authors where only the birth date is known. Warning: do not replace all NULL values in that column!

Generating Insert Statements

The collection from another library is being integrated in our database. The bibliograhpic information for the titles in that collection was extracted from another database, and is now available as a CSV file, which is formatted like this:

Title,Authors,ISBN,Publication
"Joe Celko's SQL for Smarties : Advanced SQL Programming","Celko, Joe","Burlington : Elsevier Science, 2015"
"SQL queries for mere mortals : a hands-on guide to data manipulation in SQL", "Viescas, John L;Hernandez, Michael J", "Harlow : Addison-Wesley, 2014."

Write a small Python program that reads this file in and prints out the SQL INSERT statements needed to add these records to the library database. Note, you will not only need to issue statements for updating the Works table, but also the Authors and the Works_Authors tables. Also, note that some information that is kept in separate columns in our database, was stored in a single column in the exported file you are working on (look at the Publication column).

Backing Up with SQL

SQLite has several administrative commands that aren’t part of the SQL standard. One of them is .dump, which prints the SQL commands needed to re-create the database. Another is .load, which reads a file created by .dump and restores the database. A colleague of yours thinks that storing dump files (which are text) in version control is a good way to track and manage changes to the database. What are the pros and cons of this approach? (Hint: records aren’t stored in any particular order.)