Avoid Transient Tables

This guide gives advice for avoiding the use of transient tables.

Introduction

Transient tables are nameless entities that exist for the duration of a single SQL statement and are automatically deleted at the conclusion of the statement. It is best to avoid queries that need transient tables.

Intended audience:

This document is intended to be used by Symbian platform licensees and third party application developers.

Avoid Queries that Require Transient Tables

Complex queries sometimes require SQLite to compute intermediate results that are stored in transient tables. Transient tables are nameless entities that exist for the duration of a single SQL statement and are automatically deleted at the conclusion of the statement.

We use the term transient to describe these tables rather than temporary to avoid confusion with TEMP tables. TEMP tables are tables that are private to a particular database connection and which persist for the duration of that database connection. TEMP tables have names and work like regular tables with the exceptions that they are only visible to the current database connection and are automatically deleted when the connection is closed. Transient tables, in contrast, are invisible internal holding areas for intermediate results that exist only until the end of the current SQL statement.

Transient tables will be stored either in RAM or on disk depending on device configuration. The Symbian default configuration is to store transient tables on disk although downstream device manufacturers may alter this.

When configured to store transient tables on disk, in reality a combination of memory and file space is used to hold the table. Transient tables go through the same page cache mechanism that regular database files go through. So as long as the transient tables do not grow too large, they will always be held in memory in the page cache rather than being written to disk. Information is only written to disk when the cache overflows.

When transient tables are configured to be stored in memory rather than in a file, the cache is considered infinitely large and will never overflow and thus nothing will ever be written out to the file system. Storing transient tables in memory rather than in files avoids the overhead of file I/O but requires potentially unbounded amounts of memory. Using file-based transient tables puts an upper bound on the amount of memory required but adds file I/O overhead.

One strategy for dealing with transient tables is to avoid them all together. If you never use transient tables then it does not really matter if they are stored in memory or on disk. The remainder of this section will enumerate all of the things that might provoke SQLite to create a transient table. If you avoid all of these things in your code, then you never need to worry about where your transient tables are stored.

A transient table is created whenever you use the DISTINCT keyword in a query:

SELECT DISTINCT name FROM artist;

The DISTINCT keyword guarantees that each row of the result set will be different from all other rows. In order to enforce this SQLite creates a transient table that stores all prior output rows from the query. If a new row can be found in the transient table then the new row is skipped. The same situation occurs when DISTINCT is used within an aggregate function:

SELECT avg(DISTINCT cnt) FROM playlist;

In this context the DISTINCT keyword means that the aggregate function is only applied to distinct elements of the result. As before, a transient table is used to record prior values of the result so that SQLite can tell if new results have been seen before.

The UNION, INTERSECT, and EXCEPT operators used to create compound queries always generate a distinct set of rows. Even though the DISTINCT keyword does not appear, it is implied for these connectors, and a transient table is used to enforce the distinctness. In contrast, the UNION ALL operator does not require a transient table.

A transient table might be used to implement an ORDER BY or GROUP BY clause. SQLite always tries to use an index to satisfy an ORDER BY or GROUP BY clause if it can. But if no indexes are available which can satisfy the ORDER BY or GROUP BY, then the entire results set is loaded into a transient table and sorted there.

Subqueries on the right-hand side of the IN operator use a transient table. Consider an example:

SELECT * FROM ex334a WHERE id IN (SELECT id FROM ex334b WHERE amt>7);

The results of the (SELECT id FROM ex334b WHERE amt>7) subquery are stored in a transient table. Then the value of the id column is checked against this table for each row of the outer query to determine if that row should be included in the result set.

Sometimes subqueries in the FROM clause of a query will result in a transient table. This is not always the case because SQLite tries very hard to convert subqueries in the FROM clause into a join that does not use subqueries. SQLite uses the term flattening to describe the conversion of FROM clause subqueries into joins. Flattening is an optimization that makes queries run faster. But in some cases, flattening cannot occur. When the flattening optimization is inhibited, the results of the subqueries are stored in transient tables and then a separate query is run against those transient tables to generate the final results.

Consider the following schema and query:

CREATE TABLE t1(a,b,c);
CREATE TABLE t2(x,y,z);
SELECT * FROM t1 JOIN (SELECT x,y FROM t2);

The subquery in the FROM clause is plainly visible in the SELECT statement above. But if the subquery were disguised as a view, it might be less noticeable. A view is really just a macro that serves as a place-holder for a subquery. So the SELECT statement above is equivalent to the following:

CREATE VIEW v2 AS SELECT x, y FROM t2;
SELECT * FROM t1 JOIN v2;

In either case above, whether the subquery is stated explicitly or is implied by the use of a view, flattening occurs and the query is converted into this:

SELECT a,b,c,x,y FROM t1 JOIN t2;

Had flattening not occurred, it would have been necessary to evaluate the v2 view or the subquery into a transient table then execute the outer query using the transient table as one of the two tables in the join. SQLite prefers to flatten the query because a flattened query generally uses fewer resources and is better able to take advantage of indexes. The rules for determining when flattening occurs and when it does not are complex. Flattening occurs if all of the following conditions in the outer query and in the subquery are satisfied:

  • The subquery and the outer query do not both use aggregates.

  • The subquery is not an aggregate or the outer query is not a join.

  • The subquery is not the right operand of a left outer join, or the subquery is not itself a join.

  • The subquery is not DISTINCT or the outer query is not a join.

  • The subquery is not DISTINCT or the outer query does not use aggregates.

  • The subquery does not use aggregates or the outer query is not DISTINCT.

  • The subquery has a FROM clause.

  • The subquery does not use LIMIT or the outer query is not a join.

  • The subquery does not use LIMIT or the outer query does not use aggregates.

  • The subquery does not use aggregates or the outer query does not use LIMIT.

  • The subquery and the outer query do not both have ORDER BY clauses.

  • The subquery is not the right term of a LEFT OUTER JOIN or the subquery has no WHERE clause.

  • The subquery and outer query do not both use LIMIT

  • The subquery does not use OFFSET

Nobody really expects a programmer to memorize or even understand the above set of flattening rules. As a short-cut, perhaps it is best to remember that a complicated subquery or view in the FROM clause of a complicated query might defeat the flattening optimization and thus require the use of transient tables.

One other obscure use of transient tables is when there is an INSTEAD OF DELETE or INSTEAD OF UPDATE trigger on a view. When such triggers exists and a DELETE or an UPDATE is executed against that view then a transient table is created which stores copies of the rows to be deleted or updated. Since it is unusual to have INSTEAD OF triggers in the first place this case rarely arises.

In summary, transient tables are used to implement the following features:

  • The DISTINCT keyword or other situations where distinct results are required such as compound queries using UNION, INTERSECT, or EXCEPT.

  • ORDER BY or GROUP BY clauses that cannot be satisfied by indexes.

  • Subqueries on the right-hand side of the IN operator.

  • Subqueries or views in the FROM clause of a query that cannot be flattened.

  • DELETE or UPDATE against a view with INSTEAD OF triggers.

The resource-limited nature of the environment, where file I/O is expensive and memory is scarce, means you will be well served to avoid these constructs and thus avoid the need for transient tables.

Related concepts