SQL WHERE Clause Tips

This guide provides tips for using the WHERE clause.

Introduction

You should use indexed column names rather than expressions in WHERE clauses.

Intended audience:

This document is intended to be used by Symbian platform licensees and third party application developers.

Used Indexed Column Names, not Expressions, in WHERE Clauses

Suppose you have a table with an indexed column you want to search against, like this:

CREATE TABLE demo312(
    id INTEGER PRIMARY KEY,
    name TEXT
);

When you use the demo312.id column in a query, it is important that the column name appear by itself on one side of the comparison operation and that it not be part of an expression. So the following query works very efficiently:

SELECT name FROM demo312 WHERE id=?;

But the following variations, although logically equivalent, result in a full table scan:

SELECT name FROM demo312 WHERE id-?=0;
SELECT name FROM demo312 WHERE id*1=?;
SELECT name FROM demo312 WHERE +id=?;

In other words, you want to make sure that the indexed column name appears by itself on one side or the other of the comparison operator, and not inside an expression of some kind. Even a degenerate expression such as a single unary “+” operator will disable the optimizer and cause a full table scan.

Some variation in terms of the WHERE clause is permitted. The column name can be enclosed in parentheses, it can be qualified with the name of its table, and it can occur on either side of the comparison operator. So all of the following forms are efficient, and will in fact generate identical bytecode:

SELECT name FROM demo312 WHERE id=?;
SELECT name FROM demo312 WHERE demo312.id=?;
SELECT name FROM demo312 WHERE ?=id;
SELECT name FROM demo312 WHERE (id)=?;
SELECT name FROM demo312 WHERE (((demo321.id))=?);

The previous examples have all shown SELECT statements. But the same rules apply for the WHERE clause in DELETE and UPDATE statements:

UPDATE demo312 SET name=? WHERE id=?;
DELETE FROM demo312 WHERE id=?;

Use Conjunctions in WHERE Clause Expressions

SQLite works best when the expression in a WHERE clause is a list of terms connected by the conjunction (AND) operator. Furthermore, each term should consist of a single comparison operator against an indexed column. The same rule also applies to ON clauses in a join and to a HAVING clause.

SQLite is able to optimize queries such as this:

SELECT * FROM demo313 WHERE a=5 AND b IN ('on','off') AND c>15.5;

The WHERE clause in the example above consists of three terms connected by the AND operator and each term contains a single comparison operator with a column as one operand. The three terms are:

a=5
b IN ('on','off')
c>15.5

The SQLite query optimizer is able to break the WHERE clause down and analyze each of the terms separately, and possibly use one or more of those terms with indexes to generate bytecode that runs faster. But consider the following similar query:

SELECT * FROM demo313 WHERE (a=5 AND b IN ('on','off') AND c>15.5) OR d=0;

In this case, the WHERE clause consist of two terms connected by an OR. The query optimizer is not able to break this expression up for analysis. As a result, this query will be implemented as a full table scan in which the complete WHERE clause expression will be evaluated for each row.

In this case, refactoring the WHERE clause does not help much:

SELECT * FROM demo313 WHERE (a=5 OR d=0) AND (b IN ('on','off') OR d==0)
    AND (c>15.5 OR d=0)

The WHERE clause is now a conjunctive expression but its terms are not simple comparison operators against table columns. The query optimizer will be able to break the WHERE expression into three smaller subexpressions for analysis, but because each subexpression is a disjunction, no indexes will be usable and a full table scan will result.

If you know in advance that all rows in the result set are unique (or if that is what you want anyway) then the following query can be used for an efficient implementation:

SELECT * FROM demo313 WHERE a=5 AND b IN ('on','off') AND c>15.5
UNION
SELECT * FROM demo313 WHERE d=0

In this form, the two queries are evaluated separately and their results are merged to get the final result. The WHERE clause on both queries is a conjunction of simple comparison operators so both queries could potentially be optimized to use indexes.

If the result set could potentially contain two or more identical rows, then you can run the above query efficiently as follows:

SELECT * FROM demo313 WHERE RowID IN (
    SELECT RowID FROM demo313 WHERE a=5 AND b IN('on','off') AND c>15.5
    UNION ALL
    SELECT RowID FROM demo313 WHERE d=0
)

The subquery computes a set containing the RowID of every row that should be in the result set. Then the outer query retrieves the desired rows. When a WHERE clause contains OR terms at the top level, most enterprise-class SQL database engines such as PostgreSQL or Oracle will automatically convert the query to a form similar to the above. But in order to keep minimize the complexity and size of SQLite, such advanced optimizations are omitted. In the rare cases where such queries are required, than can be optimized manually by the programmer by recasting the query statements as shown above.

Related concepts