SQL Insertion Tips

This guide provides some tips for using COMMIT statements.

Introduction

INSERT, UPDATE and DELETE operations are all very fast. However, COMMIT statements are very slow. You need to consider several techniques to ensure you make the best use of COMMIT statements.

Intended audience:

This document is intended to be used by Symbian platfomr licensees and third party application developers.

INSERT and UPDATE are Fast but COMMIT is Slow

A programmer migrating to this database engine might write a test program to see how many INSERT statements per second it can do. They create an empty database with a single empty table. Then they write a loop that runs a few thousand times and does a single INSERT statement on each iteration. Upon timing this program they find that it appears to only be doing a couple of dozen INSERTs per second.

“Everybody I talked to says SQLite is suppose to be really fast”, the new programmer will typically complain, “But I'm only getting 20 or 30 INSERTs per second!”

In reality, SQLite can achieve around 50000 or more INSERTs per second on a modern workstation, although less on a typical embedded platform. But the characteristics of the underlying storage medium and the fact that the database engine guarantees atomic updates to the database mean that it can only achieve a few dozen COMMIT operations per second.

Unless you take specific action to tell SQLite to do otherwise, it will automatically insert a COMMIT operation after every insert. So the programmers described above are really measuring the number of transactions per second, not the number of INSERTs. This is a very important distinction.

Why is COMMIT so much slower than INSERT? SQLite guarantees that changes to a database are ACID – Atomic, Consistent, Isolated, and Durable. The Atomic and Durable parts are what take the time.

In order to be Atomic, the database engine has to go through an elaborate protocol with the underlying file system, which ultimately means that every modified page of the database file must be written twice.

In order to be durable, the COMMIT operation must not return until all content has been safely written to nonvolatile media. At least two consecutive non-concurrent writes to flash memory must occur in order to COMMIT.

An atomic and durable COMMIT is a very powerful feature that can help you to build a system that is resilient, even in the face of unplanned system crashes or power failures. But the price of this resilience is that COMMIT is a relatively slow operation. Hence if performance is a priority you should strive to minimize the number of COMMITs.

If you need to do more than one INSERT or UPDATE or DELETE operation, you are advised to put them all inside a single explicit transaction by running the BEGIN statement prior to the first changes and executing COMMIT once all changes have finished. In this way, all your changes occur within a single transaction and only a single time-consuming COMMIT operation must occur.

If you omit the explicit BEGIN...COMMIT, then SQLite automatically inserts an implicit BEGIN...COMMIT around each of your INSERT, UPDATE, and DELETE statements, which means you end of doing many COMMITs which will always be much slower than doing just one.

Batch INSERT, UPDATE, and DELETE Operations Using TEMP Tables

As described above, when you have many changes to make to a database, you are advised to make all those changes within a single explicit transaction by preceding the first change with a BEGIN statement and concluding the changes with a COMMIT statement.

The problem with BEGIN...COMMIT is that BEGIN acquires an exclusive lock on the database file which is not released until the COMMIT completes. That means that only a single connection to the database can be in the middle of a BEGIN...COMMIT at one time. If another thread or process tries to start a BEGIN...COMMIT while the first is busy, the second has to wait. To avoid holding up other threads and processes, therefore, every BEGIN should be followed by a COMMIT as quickly as possible.

But sometimes you run into a situation where you have to make periodic INSERTs or UPDATEs to a database based on timed or external events. For example, you may want to do an INSERT into an event log table once every 250 milliseconds or so. You could do a separate INSERT for each event, but that would mean doing a separate COMMIT four times per second, which is perhaps more overhead than you desire. On the other hand, if you did a BEGIN and accumulated several seconds worth of INSERTs you could avoid doing a COMMIT except for every 10th second or so. The trouble there is that other threads and processes are unable to write to the database while the event log is holding its transaction open.

The usual method for avoiding this dilemma is to store all of the INSERTs in a separate TEMP table, then periodically flush the content of the TEMP table into the main database with a single operation.

A TEMP table works just like a regular database table except that a TEMP table is only visible to the database connection that creates it, and the TEMP table is automatically dropped when the database connection is closed. You create a TEMP table by inserting the “TEMP” or “TEMPORARY” keyword in between “CREATE” and “TABLE”, like this:


CREATE TEMP TABLE event_accumulator(
        eventId INTEGER,
        eventArg TEXT
    );

Because TEMP tables are ephemeral (meaning that they do not persist after the database connection closes) SQLite does not need to worry about making writes to a TEMP table atomic or durable. Hence a COMMIT to a TEMP table is very quick.

A process can do multiple INSERTs into a TEMP table without having to enclose those INSERTs within an explicit BEGIN...COMMIT for efficiency. Writes to a TEMP table are always efficient regardless of whether or not they are enclosed in an explicit transaction.

So as events arrive, they can be written into the TEMP table using isolated INSERT statements. But because the TEMP table is ephemeral, one must take care to periodically flush the contents of the TEMP table into the main database where they will persist. So every 10 seconds or so (depending on the application requirements) you can run code like this:


BEGIN;
INSERT INTO event_log SELECT * FROM event_accumulator;
DELETE FROM event_accumulator;
COMMIT;

These statements transfer the content of the ephemeral event_accumulator table over to the persistent event_log table as a single atomic operation. Since this transfer occurs relatively infrequently, minimal database overhead is incurred.

Use Bound Parameters

Suppose you have a descriptor, nameDes, and you want to insert that value into the namelist table of a database. One way to proceed is to construct an appropriate INSERT statement that contains the desired string value as a SQL string literal, then run that INSERT statement. Pseudo-code for this approach follows:


_LIT(KSql, “INSERT INTO namelist VALUES('%S')”);
sqlBuf.Format(KSql, nameDes);
sqlDatabase.Execute(sql);

The INSERT statement is constructed by the call to Format() on the second line of the example above. The first argument is a template for the SQL statement. The value of the nameDes descriptor is inserted where the %S occurs in the template. Notice that the %S is surrounded by single quotes so that the string will be properly contained in SQL standard quotes.

This approach works as long as the value in nameDes does not contain any single-quote characters. If nameDes does contain one or more single-quotes, then the string literal in the INSERT statement will not be well-formed and a syntax error might occur. Or worse, if a hostile user is able to control the content of nameDes, they might be able to put text in nameDes that looked something like this:


hi'); DELETE FROM critical_table; SELECT 'hi

This would result in the sqlBuf variable holding


INSERT INTO namelist VALUES('hi'); DELETE FROM critical_table; SELECT 'hi'

Your adversary has managed to convert your single INSERT statement into three separate SQL statements, one of which does things that you probably do not want to happen. This is called an “SQL Injection Attack”. You want to be very, very careful to avoid SQL injection attacks as they can seriously compromise the security of your application.

SQLite allows you to specify parameters in SQL statements and then substitute values for those parameters prior to running the SQL. Parameters can take several forms, including:


?
?NNN
:AAA
@AAA
$AAA

In the above, NNN means any sequence of digits and AAA means any sequence of alphanumeric characters and underscores. In this example we will stick with the first and simplest form – the question mark. The operation above would be rewritten as shown below. (Error checking is omitted from this example for brevity.)


_LIT(KSql, “INSERT INTO namelist VALUES(?)”);
RSqlStatement s;
s.PrepareL(db, KSql);
s.BindText(1, nameDes);
s.Exec();
s.Close();

PrepareL() compiles the SQL statement held in the literal KSql. This statement contains a single parameter. The value for this parameter is initially NULL.

The BindText() sets the value of this parameter to the content of the nameDes descriptor and then Exec() executes the SQL statement with the bound parameter value.

There are variations of BindXxx() to bind other kinds of values such as integers, floating point numbers, and binary large objects (BLOBs). The key point to observe is that none of these values need to be quoted or escaped in any way. And there is no possibility of being vulnerable to an SQL injection attack.

Besides reducing your vulnerability to SQL injection attacks, the use of bound parameters also happens to be more efficient that constructing SQL statements from scratch, especially when inserting large strings or BLOBs.

Cache and Reuse Prepared Statements

Using RSqlStatement is a two-step process. Firstly the statement must be compiled using Prepare(). Then the resulting prepared statement is run using either Exec() or Next().

The relative amount of time spent doing each of these steps depends on the nature of the SQL statement. SELECT statements that return a large result set or UPDATE or DELETE statements that touch many rows of a table will normally spend most of their time in the Virtual Machine module and relatively little time being compiled. But simple INSERT statements on the other hand, can take twice as long to compile as they take to run in the virtual machine.

A simple way to reduce the CPU load of an application that uses SQLite is to cache the prepared statements and reuse them. Of course, one rarely needs to run the exact same SQL statement more than once. But if a statement contains one or more bound parameters, you can bind new values to the parameters prior to each run and thus accomplish something different with each invocation.

This technique is especially effective when doing multiple INSERTs into the same table. Instead of preparing a separate insert for each row, create a single generic insert statement like this:


INSERT INTO important_table VALUES(?,?,?,?,?)

Then for each row to be inserted, use one or more of the BindXxx() interfaces to bind values to the parameters in the insert statement, and call Exec() to do the insert, then call Reset() to rewind the program counter of the internal bytecode in preparation for the next run.

For INSERT statements, reusing a single prepared statement in this way will typically make your code run two or three times faster.

You can manually manage a cache of prepared statements, keeping around only those prepared statements that you know will be needed again and closing prepared statements using Close() when you are done with them or when they are about to fall out of scope. But depending on the application, it can be more convenient to create a wrapper class around the SQL interface that manages the cache automatically.

A wrapper class can keep around the 5 or 10 most recently used prepared statements and reuse those statements if the same SQL is requested. Handling the prepared statement cache automatically in a wrapper has the advantage that it frees you to focus more mental energy on writing a great application and less effort on operating the database interface. It also makes the programming task less error prone since with an automatic class, there is no chance of accidentally omitting a call to Close() and leaking prepared statements.

The downside is that a cache wrapper will not have the foresight of a human programmer and will often cache prepared statements that are no longer needed, thus using excess memory, or sometimes discard prepared statements just before they are needed again.

This is a classic ease-of-programming versus performance trade-off. For applications that are intended for a high-power workstation, it can be best to go with a wrapper class that handles the cache automatically. But when designing an application for a resource constrained devices where performance is critical and engineering design talent is plentiful, it may be better to manage the cache manually.

Regardless of whether or not the prepared statement cache is managed manually or automatically using a wrapper class, reusing prepared statements is always a good thing, and can in some cases double or triple the performance of the application.

Related information