November 10, 2006 – Data Protection Down Pat

What the heck is BerkeleyDB? Origionally it was SleepyCat but was acquired by Oracle a few months back. Ok, that dosen’t answer the question, it only gives the history of the product. What it is typically used for is an embedded database. There are two flavors, Berkeley DB and Berkeley DB Java Edition. The java edition is entirely written in Java and can run on any machine that runs a java virtual machine. On top of the Berkeley DB is an XML engine that allows you to query the database using XML queries. The database is not relational. It is not object oriented. It uses key-value pairs to find the data. Values are simply payloads and can be upto 4G in length. A database table can be upto 256TB in size.

This software is different than a traditional database in that it is compiled into the application with a set of libraries. There may be many threads of control but no barriers between the threads. It is assumed that the program will maintain consistency and separation of duties. There are three access methods to data through the database engine: btree, hash, queue, and renco. The interfaces to save, change, and read records are the same. The access method and storage procedures are different for each access method.

SQL and Berkeley DB have similar actions but are performed differently. The concepts are the same but the actions are different. In SQL, for example, a database is a collection of tables which contains rows. Different elements from tables can be combined with an AND command to join data. In Berkeley DB, the database is an element of an environment. Instead of a row, there is a record which is a key-data pair. To combine different elements from different databases a join function is used passing the database parameters. Transactions, indexes, and backup have similar contexts between both systems.

Berkeley DB is very fast as far as transactions are concerned since everything is done in memory and copied to disk. A typical in memory or cached disk configuration can support 200 million non transaction reads per second or 250 thousand transactions per second. Transaction commit sync to IO can support 30 to 60 transactions per second. A sync to the operating system is about 50 thousand transactions per second and commit with no sync is 90 thousand transactions per second sustained.

The key reasons to choose Berkeley DB is performance, concurrency, memory footprint, and cost. Since this is an embedded library, a higher degree of knowledge is necessary. Since it does not use typical SQL commands but library calls, the knowledge is specific to Berkeley DB and not database in general.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30