This document explains how the optimizer changes expressions.
The way an expression is written really does matter. When written well expressions not only clearly state what is intended but they can also improve the performance of the SQL engine significantly. This document provides several tips designed to help you improve your SQL expressions.
Intended audience:
This document is intended to be used by Symbian platform licensees and third party application developers.
In an earlier section we described how the optimizer would only make use of an index if one of the columns being indexed occurred on one side of certain equality and comparison operators (“= ”, “IN ”, “< ”, “> ”, “<= ”, “>= ”, and sometimes “IS NULL ”).
While this is technically true, prior to the stage of the analysis
where the optimizer is looking for these kinds of expressions, it
may have first modified the WHERE
clause (or the ON
or HAVING
clause) from what was originally
entered by the programmer.
The next few paragraphs will describe some of these rewriting rules.
The query optimizer always
rewrites the BETWEEN
operator as a pair of inequalities.
So, for example, if the input SQL is this:
SELECT * FROM demo324 WHERE x BETWEEN 7 AND 23;
What the query optimizer ultimately sees is this:
SELECT * FROM demo324 WHERE x>=7 AND x<=23;
In this revised form, the optimizer might be able to use an index on the “x” column to speed the operation of the query.
Another rewriting rule is that a disjunction of two or more equality tests against the same column is changed into a single IN operator. So if you write:
SELECT * FROM demo324 WHERE x=7 OR x=23 OR x=47;
The WHERE clause will be rewritten into the following form:
SELECT * FROM demo324 WHERE x IN (7,23,47);
The original format was not a candidate for use of indexes. But after the disjunction is converted into a single IN operator the usual index processing logic applies and the query can be made much faster.
In order for this rewriting rule to be applied, however, all terms of the disjunction must be equality comparisons against the same column. It will not work to have a disjunction involving two or more columns or involving expressions. So, for instance, the following statements will not be optimized:
SELECT * FROM demo324 WHERE x=7 OR y=23; SELECT * FROM demo324 WHERE x=7 OR +x=23;
The query parser and compiler in SQLite are designed to be small, fast, and lean. A consequence of this design is that SQLite does not do much in the way of constant folding or common subexpression elimination. SQLite evaluates SQL expressions mostly as written.
One way to work around the lack of constant folding in SQLite is to enclose constant subexpressions within a subquery. SQLite does optimize constant subqueries – it evaluates them once, remembers the result, and then reuses that result repeatedly.
An example will help clarify how this works. Suppose you have a table that contains a timestamp recorded as the fractional julian day number:
CREATE TABLE demo325(tm DATE, data BLOB);
A query against this table to find all entries after November 8, 2006 might look like the following:
SELECT data FROM demo325 WHERE tm>julianday('2006-11-08');
This query works fine. The problem is that the “julianday('2006-11-08')” function gets called repeatedly, once for each row tested, even though the function returns the same value each time. It is much more efficient to call the function one time only and reuse the result over and over. You can accomplish this by moving the function call inside a subquery as follows:
SELECT data FROM demo325 WHERE tm>(SELECT julianday('2006-11-08'));
There are, of course, some cases where multiple evaluations of a function in the WHERE clause is desirable. For example, suppose you want to return roughly one out of every eight records, chosen at random. A suitable query would be:
SELECT data FROM demo325 WHERE (random()&7)==0;
In this case, moving the function evaluation into a subquery would not work as desired:
SELECT data FROM demo325 WHERE (SELECT random()&7)==0;
In this last example, the result would be either all
of the records in the table (probability 12.5%) or none of them (probability
87.5%). The difference here, of course, is that the random()
function is not constant whereas the julianday()
function is.
The SQL compiler does not have any way of knowing
this so it always assumes the worst: that every function works like random()
and can potentially return a different answer even
with the same inputs. Use a subquery if you truly want to make a subexpression
constant.