Databases and SQL - Library edition
Filtering
Learning Objectives
- Write queries that select records that satisfy user-specified conditions.
- Explain the order in which the clauses in a query are executed.
One of the most powerful features of a database is the ability to filter data, i.e., to select only those records that match certain criteria. For example, suppose we want to list all the books in the catalogue that have been published by Wiley. We can select these records from the Works
table by using a WHERE
clause in our query:
SELECT * FROM Works WHERE Publisher="Wiley";
Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|
2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
11 | SQL bible | 9780470229064 | 2008 | Indianapolis | Wiley | 2nd ed. | 857 |
13 | SQL for dummies | 9780470557419 | 2010 | Hoboken | Wiley | 7th ed. | 440 |
15 | Beginning SQL | 0764577328 | 2005 | Indianapolis | Wiley | 501 | |
17 | SQL all-in-one | 9780470929964 | 2011 | Hoboken | Wiley | 2nd ed. | 708 |
18 | Access 2013 all-in-one | 9781118510551 | 2013 | Hoboken | Wiley | 760 |
The database manager executes this query in two stages. First, it checks at each row in the Works
table to see which ones satisfy the WHERE
. It then uses the column names following the SELECT
keyword to determine what columns to display.
This processing order means that we can filter records using WHERE
based on values in columns that aren’t then displayed:
SELECT Title FROM Works WHERE Publisher="Wiley";
Title |
---|
SQL for dummies |
Discovering SQL |
SQL bible |
SQL for dummies |
Beginning SQL |
SQL all-in-one |
Access 2013 all-in-one |
We can use many other Boolean operators to filter our data. For example, we can display only titles that have been published by Wiley in 2011 or later:
SELECT * FROM Works WHERE (Publisher='Wiley') AND (Date>='2011');
Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|
2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
17 | SQL all-in-one | 9780470929964 | 2011 | Hoboken | Wiley | 2nd ed. | 708 |
18 | Access 2013 all-in-one | 9781118510551 | 2013 | Hoboken | Wiley | 760 |
(The parentheses around the individual tests aren’t strictly required, but they help make the query easier to read.)
If we want to list all books published either by Wiley or O’Reilly, we can combine the query using OR
. Note in passing how the apostrophe in the name O’Reilly is escaped:
SELECT * FROM Works WHERE Publisher='Wiley' OR Publisher='O''Reilly';
Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|
1 | SQL in a nutshell | 9780596518844 | 2009 | Sebastopol | O'Reilly | 3rd ed. | 578 |
2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
3 | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O'Reilly | 2nd ed. | 532 |
4 | Using SQLite | 9780596521189 | 2010 | Sebastopol | O'Reilly | 1st ed. | 503 |
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
11 | SQL bible | 9780470229064 | 2008 | Indianapolis | Wiley | 2nd ed. | 857 |
12 | Learning SQL | 9780596520830 | 2009 | Sebastopol | O'Reilly | 2nd ed. | 320 |
13 | SQL for dummies | 9780470557419 | 2010 | Hoboken | Wiley | 7th ed. | 440 |
15 | Beginning SQL | 0764577328 | 2005 | Indianapolis | Wiley | 501 | |
17 | SQL all-in-one | 9780470929964 | 2011 | Hoboken | Wiley | 2nd ed. | 708 |
18 | Access 2013 all-in-one | 9781118510551 | 2013 | Hoboken | Wiley | 760 | |
19 | SQL in a nutshell | 0596004818 | 2004 | Cambridge | O'Reilly | 2nd ed. | 691 |
20 | MySQL in a nutshell | 0596007892 | 2005 | Sebastopol | O'Reilly | 1st ed. | 321 |
Alternatively, we can use IN
to see if a value is in a specific set:
SELECT * FROM Works WHERE Publisher IN('Wiley','O''Reilly');
Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|
1 | SQL in a nutshell | 9780596518844 | 2009 | Sebastopol | O'Reilly | 3rd ed. | 578 |
2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
3 | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O'Reilly | 2nd ed. | 532 |
4 | Using SQLite | 9780596521189 | 2010 | Sebastopol | O'Reilly | 1st ed. | 503 |
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
11 | SQL bible | 9780470229064 | 2008 | Indianapolis | Wiley | 2nd ed. | 857 |
12 | Learning SQL | 9780596520830 | 2009 | Sebastopol | O'Reilly | 2nd ed. | 320 |
13 | SQL for dummies | 9780470557419 | 2010 | Hoboken | Wiley | 7th ed. | 440 |
15 | Beginning SQL | 0764577328 | 2005 | Indianapolis | Wiley | 501 | |
17 | SQL all-in-one | 9780470929964 | 2011 | Hoboken | Wiley | 2nd ed. | 708 |
18 | Access 2013 all-in-one | 9781118510551 | 2013 | Hoboken | Wiley | 760 | |
19 | SQL in a nutshell | 0596004818 | 2004 | Cambridge | O'Reilly | 2nd ed. | 691 |
20 | MySQL in a nutshell | 0596007892 | 2005 | Sebastopol | O'Reilly | 1st ed. | 321 |
We can combine AND
with OR
, but we need to be careful about which operator is executed first. If we don’t use parentheses, we get this:
SELECT * FROM Works WHERE Publisher='Wiley' OR Publisher='O''Reilly' AND Date>=2011;
Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|
2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
3 | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O'Reilly | 2nd ed. | 532 |
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
11 | SQL bible | 9780470229064 | 2008 | Indianapolis | Wiley | 2nd ed. | 857 |
13 | SQL for dummies | 9780470557419 | 2010 | Hoboken | Wiley | 7th ed. | 440 |
15 | Beginning SQL | 0764577328 | 2005 | Indianapolis | Wiley | 501 | |
17 | SQL all-in-one | 9780470929964 | 2011 | Hoboken | Wiley | 2nd ed. | 708 |
18 | Access 2013 all-in-one | 9781118510551 | 2013 | Hoboken | Wiley | 760 |
which is a list of the books published either by Wiley at any given time and those by O’Reilly in 2011 or later. We probably want this instead:
SELECT * FROM Works WHERE (Publisher='Wiley' OR Publisher='O''Reilly') AND Date>=2011;
Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|
2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
3 | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O'Reilly | 2nd ed. | 532 |
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
17 | SQL all-in-one | 9780470929964 | 2011 | Hoboken | Wiley | 2nd ed. | 708 |
18 | Access 2013 all-in-one | 9781118510551 | 2013 | Hoboken | Wiley | 760 |
We can also filter by partial matches. For example, if we want to know something just about the site names beginning with “DR” we can use the LIKE
keyword. The percent symbol acts as a wildcard, matching any characters in that place. It can be used at the beginning, middle, or end of the string:
SELECT * FROM Works WHERE Publisher LIKE "%y";
Work_ID | Title | ISBN | Date | Place | Publisher | Edition | Pages |
---|---|---|---|---|---|---|---|
1 | SQL in a nutshell | 9780596518844 | 2009 | Sebastopol | O'Reilly | 3rd ed. | 578 |
2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
3 | PHP & MySQL | 9781449325572 | 2013 | Sebastopol | O'Reilly | 2nd ed. | 532 |
4 | Using SQLite | 9780596521189 | 2010 | Sebastopol | O'Reilly | 1st ed. | 503 |
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
11 | SQL bible | 9780470229064 | 2008 | Indianapolis | Wiley | 2nd ed. | 857 |
12 | Learning SQL | 9780596520830 | 2009 | Sebastopol | O'Reilly | 2nd ed. | 320 |
13 | SQL for dummies | 9780470557419 | 2010 | Hoboken | Wiley | 7th ed. | 440 |
15 | Beginning SQL | 0764577328 | 2005 | Indianapolis | Wiley | 501 | |
17 | SQL all-in-one | 9780470929964 | 2011 | Hoboken | Wiley | 2nd ed. | 708 |
18 | Access 2013 all-in-one | 9781118510551 | 2013 | Hoboken | Wiley | 760 | |
19 | SQL in a nutshell | 0596004818 | 2004 | Cambridge | O'Reilly | 2nd ed. | 691 |
20 | MySQL in a nutshell | 0596007892 | 2005 | Sebastopol | O'Reilly | 1st ed. | 321 |
Finally, we can use DISTINCT
with WHERE
to give a second level of filtering:
SELECT DISTINCT Date, Publisher FROM Works WHERE (Publisher='Wiley' OR Publisher='O''Reilly') AND Date>=2011;
Date | Publisher |
---|---|
2013 | Wiley |
2013 | O'Reilly |
2011 | Wiley |
But remember: DISTINCT
is applied to the values displayed in the chosen columns, not to the entire rows as they are being processed.
What we have just done is how most people “grow” their SQL queries. We started with something simple that did part of what we wanted, then added more clauses one by one, testing their effects as we went. This is a good strategy — in fact, for complex queries it’s often the only strategy — but it depends on quick turnaround, and on us recognizing the right answer when we get it.
The best way to achieve quick turnaround is often to put a subset of data in a temporary database and run our queries against that, or to fill a small database with synthesized records. For example, instead of trying our queries against an actual database of 20 million bibliographic records, we could run it against a sample of ten thousand, or write a small program to generate ten thousand random (but plausible) records and use that.
Weeding the Library by Date
The library wants to weed its collection and you need to review all books published after 2004 (earlier books have already been reviewed) but before 2010. Write a query that selects the titles that need to be reviewed.
Finding Recent Titles on SQL
Write a query that lists all books in the database that have the word “SQL” in their title and that have been published after 2010. Hint: we are not looking for titles that are exactly “SQL”, but those that contain “SQL” somewhere in their title. You might want to use LIKE
to do that.
Matching Patterns
Which of these expressions are true?
'a' LIKE 'a'
'a' LIKE '%a'
'beta' LIKE '%a'
'alpha' LIKE 'a%%'
'alpha' LIKE 'a%p%'