Embedded SQL in Java

There a number of pure Java SQL database systems develolpers can embed into applications. In this tip we introduce two open source projects, HSQLDB and Apache Derby.

One dilemma of deployment is what to do about a SQL database. For easy deployment of a Java application, you've probably used tools like Web Start to let the application download and run in almost a single click, but as soon as you want to start using an SQL database for persistence, you need to know what the database is and where it is on the system.

That initial configuration can be somewhat off-putting to many users, and if the users don't have a database to hand, the user can't run the application at all. Given the utility of having an SQL database to hand for persistence, it can be hard to do without one.

One option is to carry your own SQL database system and embed it in your application. There are now a number of pure Java SQL database systems which support being embedded into applications; here we'll talk about the two most visible open source projects, HSQLDB and Apache Derby.

HSQLDB is a lightweight SQL database, which appeared originally as HSQL, and was focused on "in memory" operations. It forked to become HSQLDB and developed a persistent on-disk mode. It has been developed as open source software from its inception, with performance steadily improving over time, so that it's now capable of handling substantial databases.

Apache Derby is a more heavyweight database, with classic RDBMS properties, focused entirely on being a disk based, rather than in-memory, database. It's currently what is called an Apache incubator project having recently been donated by IBM to the Apache project. Apache Derby traces its roots back through IBM, Informix and back to Cloudscape, where it was developed in a closed source environment, hence the 'Incubator' phase of adoption by Apache as Derby builds a community around itself.

Introducing Pithy
To demonstrate embedding, I've provided two example implementations (downloadable at http://www.builderau.com.au/resources/pithy.zip) of an application called Pithy, which is a basic database for storing pithy quotations. Each Pith object has an id number, a category and a quote, and you can add them and get a random quote from the entire set or one category. I've added a simple command line interface so you can start and stop the embedded server and load it with test data.

PithyDBHSQLDB in use
Pithy>load pith.txt
Added with id 1001
[Bohr] No, no, you're not thinking, you're just being logical.(1018)
Closing down

Starting up
Both HSQLDB and Derby have embedded JDBC drivers which allow your application code to access the database without overtly hard wiring the fact you are using an embedded database into your code. What is different from accessing a remote SQL database is that the JDBC URL you would use to access the remote database takes on the added role of configuring the database. Here's the initialisation for a HSQLDB database;

conn = DriverManager.getConnection("jdbc:hsqldb:pithyhsqldb/pithydb","sa","");

The URL specifies the driver with "jdbc:hsqldb:", this is followed by an optional path and name of the database. When the connection is established, HSQLDB checks for a database in that location, and if there isn't one, creates it. After execution, you'll see a group of files in the directory with various extensions, all starting with "pithydb".

Derby uses the same scheme with added attributes;

conn = DriverManager.getConnection("jdbc:derby:pithderby/pithydb;create=true","sa","");

With Derby, you can pass a large number of properties using the URL; in this case, the ";create=true" forces the creation of the database files. With Derby, you get a directory called pithydb, which contains all the database files.

With the connection created, you can now use the JDBC API to manipulate the database. Of course, if you have just created the database, there won't be any tables there. If you want to create your schema up front, it's simplest to check the database metadata to see what tables already exist;

DatabaseMetaData dbmd = conn.getMetaData();
ResultSet rs = dbmd.getTables(null, null, "%", null);

This is a simple query which will return a resultset of table information in the database; The third column of the resultset is the table name. In the Pithy code, we just check for the existence of one table and create it if it isn't present; obviously in a non-trivial application you can adopt a more sophisticated strategy for table creation.

When you do use an embedded database, the temptation will probably quite high, if you are crafting your own SQL commands, to hard wire it to the database you are embedding. SQL dialects do vary, some more than others. Take HSQLDB which has its own variant of CREATE, CREATE CACHED, to specify a table is cached in memory, but persisted on disk. In a similar vein, take SQL identity columns; apart from the variation in the syntax in defining an identity column, there are also different ways to retrieve the identity value. On HSQLDB, you execute CALL IDENTITY(); on Derby, you execute values IDENTITY_VAL_LOCAL().

Shutting down
These variations mean that when embedding a database it's still best practice to abstract your database persistence layer, especially if you are hand rolling your own SQL. To emphasise that, let's talk about shutting the database down.

First, the easy way, you can just cleanly exit the process. Both HSQLDB and Derby will shutdown their databases when your application exits. But HSQLDB has a hidden catch; the write delay. By default, HSQLDB has a write delay for all activity of 60 seconds. I'd forgotten this myself when developing the example for this article and was running short lived tests and was surprised to find empty databases on disk. You can of course turn off the write delay feature completely, or tune it to something more appropriate to your application's life cycle.

stmt.execute("SET WRITE_DELAY FALSE;");

Now, the deliberate way of shutting down the database; you may want to close the database while the process carries on running, for example when your application wants to do some maintenance routine which involves changing to a different database. HSQLDB is simple to shut down; execute the shutdown command and then close the connection. Derby is a little more complex; shutdown is done through the getConnection call.

try {
} catch (SQLException e) {
// Silently drop the exception here

In the same way that we passed the property to get Derby to create the database, we send it a command to shut it down. Note that we are just shutting down the specific database, not the entire Derby system, which requires you to unload the JDBC driver and reload it. When you close down Derby like this, the getConnection() method by design, throws an exception (because logically there is no connection once you've shut the database down).

In Testing
Starting up and shutting down take time though and in your development cycle, you really don't want to waste time loading the database in your unit testing. The simplest solution is then to run the database remotely. HSQLDB makes it very simple to run a server from the command line

java -cp lib/hsqldb.jar org.hsqldb.Server -database pithyremotedb

will start the server. Then change the URL for the database in the application to "jdbc:hsqldb:hsql://localhost" and run the application, but remember to not tell the database to shutdown when your application exits.

If you want to do a similar thing with Derby, you'll need to download IBM's Universal DB driver for Derby (http://www-128.ibm.com/developerworks/db2/downloads/jcc/); the one bit of the Derby equation not under the Apache project, redistributable as binary and not open source. You'll need to register and agree to the licence there to get the download it.

Which Database?
You may be thinking that embedding your own SQL database is going to vastly increase the size of your application, but that's not so; HSQLDB's jar file clocks in at around 600KB (complete with a whole wedge of built in tools like database managers), which is more than manageable in most deployment scenarios. Apache Derby is around 2MB, excluding the localisation jars and the network driver which you may or may not need.

This leaves the question of which one to use; HSQLDB has a big mark against it for its lack of ACID compliance, but what you lose there, you gain in performance. There's a whole class of applications which can get away with the lack of ACID compliance (by being resolutely single threaded for example).

Derby on the other hand is ACID compliant and is across the board the much more capable database, packed with. ACID properties come with a performance penalty. Although not definitive, the benchmarks run by Jamie Lawrence at http://jamie.ideasasylum.com/notebook/index.php?id=4 give a feel for the differences between Derby and HSQLDB.

Personally, I like to have both HSQLDB and Derby to hand and make the decision on which to use on a case by case basis, and remember to build that abstraction layer, so it's easy to switch.

DJ Walker-Morgan is a consulting developer, specialising in Java and user-to-user messaging and conferencing.