so let’s dive a little deeper into Berkeley DB
Berkeley DB is a general purpose embedded database engine. It is extremely fast. It is compiled and linked into your application. It runs in the same process space as your application. The database can store upto 256 terabytes of data and support 4 gigabytes of record keys.
There are four different in-memory cache designs: BTree, Hash, Queue, and Recno. The BTree and Hash are both for fast indexing and retreival. BTree is good for data that has locality of reference, each element relates to each other in some way. Hash is good for extremely large data sets. Queue is used for fast insert at the tail of the queue. Queue is good for high degrees of concurrency. Renco provides support for databases whose permenant storage is a flat text file.
To insert data into the database, it is different from the other versions of databases offered from Oracle. If you look at the ExampleDatabaseLoad.java that comes with the binary distribution, you can see that entries are stored in the database with a put command.
myDbs.getVendorDB().put(null, theKey, theData);
In the command, we reference an already opened database, get the database instance known as VendorDB and call the put function to insert data. The data is inserted as a key and data elements. The key is a single element and the data is an array of elements. The function getVendorDB returns a string that points to a database that we create using the BTree construct in a file. This is done with the
new Database(“file”, null, DbConfig)
function. The file parameter points to a directory and a file to store data.
Note that with Berkeley DB, you need to manage where everything is, how things are created, and how to add and search elements from the repository. It does not differ much from keeping records in a file but it gives you a good way of indexing and searching files that could potentially contain large amounts of data. Data is read using a get function to retrieve data as if it were an element of a structure. The key is used to point to the right element so that the right data is accessed.
Records can also be deleted using the delete function. It is important to remember that records are not truly deleted until the cache has not been written to the disk. This can be done manually with a sync or a close function call.
Cursors can be used to iterate over records in a database. If a database allows duplicate records off one key, then the cursor is the easiest way to access something other than the first record. Records are read using the cursor.getNext() function or the getPrev() function. Data can be written using cursors either with the putNoDupData, putNoOverwrite, putKeyFirst, or putKeyLast functions. Updates are done with cursor.putCurrent and deletes are done with cursor.delete.
It is important to remember that you can open multiple databases at the same time and run them separately in different threads and even join data between multiple databases. I will not go into this detail here. The intention of this blog entry is to introduce the concept of Berkeley DB and how to insert, delete, update, and search for database elements.