March 2007 – Data Protection Down Pat

TimesTen version 7.0

TimesTen version 7.0 was just released for TimesTen” home page. The product is available for Linux, Windows, Solaris, HP Tru64, HP-UX, HP-UX Itanium, and IBM AIX.

Oracle TimesTen In-Memory Database is a memory-resident relational database that empowers applications with the instant responsiveness and very high throughput required by today�s real-time enterprises and industries such as telecom, capital markets and defense. Deployed in the application tier as an embedded database, Oracle TimesTen In-Memory Database operates on databases that fit entirely in physical memory using standard SQL interfaces.

The biggest feature of the 7.0 release is easy-to-use caching. What this allows you to do is leverage your middleware tier and reduce latency to customer interaction. Instead of going back to the master database, you can hit the TimesTen cache copy of the master database. The driving trends helpign with adoption of this product is the demand for real-time enterprise analysis. SOA is requiring an increase in data and metadata access. Business intelligence is looking at data flows and metadata descriptors. Many of the batch analytics that were done during the night can be done during the day and not after hours. The key benefits are offloading the database hardware requirements, reducing data latency, and allowing for extension of the middleware analytics.

Some of the new features also includes support for upto a terabyte of memory to host data as well as international language support. All of the languages supported by the Oracle database are now supported by TimesTen. There are hardware vendors that allow for multi terabytes to be configured in memory; Sun, Fujitsu (1T), HP, IBM(2T), and SGI(128T).

The cache component allows for rows, columns, or tables to be pulled into the TimesTen cache and available to the application layer. If data is changed in the backend database, updates are pushed to the cache rather than invalidated. The caches should always contain the latest updates from the master repository. Alternatively, data that is active or soon to be active can be pulled into the cache based on who is logged into the web service. Instead of pulling a whole table into the cache, just the customer relevant data is pre-populated before the customer asks for the data. This mechanism is very good for supporting portals that don’t currently scale, product information, and shipping data based on a customer. A third mechanism that can be used is a sliding window. Users can define a sliding window that defines relevance. Instead of loading a whole table, only data changed since a given fixed date is pulled into the cache. For example, when a customer logs into their bank they typically only want to look at current ballances and transactions since the last statement. The login can pull the last 30 days transactions for this customer and get the in-memory database populated before the customer wants to start looking at the data. The different cache methods can be mixed and matched to support applications at the middle tier.

One thing that is interesting with TimesTen is that it allows for data replication from the master repository similar to what is done with RAC. It functions differently than RAC in that the querries are performed at the middle tier layer and not at the database. This is an alternate way of providing scalability without RAC or on top of RAC.

TimesTen is a relational database cache for data shared via Oracle database. TimesTen allows you to pull this from the master reposiroty. This differs from Coherence which is a distributed data repository that shares data in a peer-to-peer grid. The data is not stored on a disk but is kept as an ad-hoc data grid that stays in memory on all of the nodes.

As part of our library system design, I will look at how we can use TimesTen as an optimizer to speed up user requests. We will also look at locking requirements for the database and how this can be done at the TimesTen layer.

Enterprise Manager

I have been playing with Oracle Enterprise Manager this morning. I have OEM 10.2.0.1 installed on a virtual machine. Unfortunately, I don’t have my virtual image patched to take the daylight savings time problem into account. The following is what I saw when I tried to start the OEM.

$ ./emctl start dbconsole
TZ set to US/Central
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright……
http://localhost.localdomain:1158/em/console/aboutApplication

Timezone mismatch: the agentTZRegion value (US/Central) in /home/oracle/oracle/product/10.2.0/db_1/localhost.localdomain_orcl/sysman/config/emd.properties does not match the current environment TZ settings(US/Central).
The dbconsole cannot run with this mismatch

To fix this problem I have two choices. The first is obvious, patch the software installation. Being lazy, I didn’t want to do that. If I reset the time to be April 1st, the time zone problem is not an issue. This is done with the following command:

# date 04010915

The command must be run as root and if of the format MMDDHHmm or Month|Day|Hour|Minute with each element being two digits. Once I do this, OEM starts properly and begins recording data as expected. Unfortunately, the date will be set to April 1st.

library schema design

Let’s get back to the library system. First let’s look at the design. I have scanned in most of the books that we have on our bookshelves at home. I was surprised to find out that we have over 300 books. I didn’t scan the kids books in their room. My guess is that we have 600 books in the house. How did that happen? I imported these books into www.librarything.com so that I can auto populate most of the data by entering the ISBN number. I purchased a bar code scanner and found that the majority of the books include the ISBN number on the bar code. Many books have two bar code numbers, one for inventory control and one for ISBN. I selected librarything because it allows me to search the Library of Congress as well as Amazon to populate the library information. It allows you to enter the ISBN number, the book title, or the author name and it presents a list of matches. If there is no match in the Library of Congress, it allows you to search Amazon. I like this because between the two I found data for each book in our house. The biggest problem is that at times I found multiple copies because many books have multiple editions. A significant amount of the matches included a picture of the book cover which helped significantly.

The biggest benefit of this service is that it allows me to export the book information to a comma separated or tab separated file and import it into a database or Excel program. The fields in this export are:

book id
title
author first and last name
author last and first name
other authors
publisher
date of publication
ISBN number
series
source (Amazon or Library of Congress)
language
language 2
original language
LC Call number
Dewey Decimal number
BCID
date entered
date entered
date entered
date entered
tags
rating
review
summary
comments
encoding

Looking at the data in this collection brings up a few questions. Why can’t I use Excel as the basis for this information? First, how do I record what is checked out and when it is checked out? Using one of the date entered and comments fields? I could but it would be difficult. The user who checks in and checks out books would need to be familiar with Excel and have a uniform way of checking in and checking out books. If multiple people do this using different fields or different formats, the data will become corrupted. We need to create a browser front end to make the data entry uniform and the way that data changes the same.

Second, if I wanted to sort the data using some index and perform a some simple analytics, an Excel spreadsheet will not work. If someone checks out a book and checks it back in, the record will be cleansed and lost if it is done in one spreadsheet. It would be nice at some time to find out what the most popular book is. It would also be nice to present a list of books that students have checked out and make it visible to parents and teachers. It gives both a tool to help educate the child and the librarian a starting point when they go to purchase more books. Analytics like this require a transactional log of book usage and who used them. Using a spreadsheet does not allow us to record these transactions or easily coordinate between two worksheets and archive this data. Some type of database is required for this information.

The discussion of checking in and checking out books requires another table schema. We need something simple to track the students.

Student id
Student first and last name
Homeroom teacher
Grade

With this information we can record when a student checks out a book by correlating a book to a student id. We can also let a parent look at the current books checked out by presenting the student id and searching for books checked out. The librarian can also tag a class of books for not to be checked out by grade levels. We might use either the series or rating elements in the book schema. For example, if a kindergartener wants to check out a book that is high school rated, the check out program will flag the process and require a verification that this child can check out this book.

We will also need a mechanism to record book status. This will require a specific schema element to track this.

book id
date checked out
student id

Some elements that are missing from this library schema are the cost of a book and the location of a book in the library. Since this library is a small library of a few thousand books with less than 20 book shelves, there is really no need to record this information. If we had multiple libraries or libraries in multiple rooms at the same site, we might want to record book location. If we were going to charge for books based on our cost of purchasing the book, we would want to record the cost of the books. Since our library is a volunteer library where the vast majority of books have been donated or purchased in bulk from Half Price Books, we don’t track the cost of each individual book. We do record how much has been spent on all of the books in the library. We have been collecting books for the past 8 months and have spent more on bookshelves than we have on books. We have just over 7000 books and have spent $500 on bookshelves (the cost of wood, nails, and screws) and $200 on books. The funding for these books came from a book fair where we got a percentage of profits from all books sold. During this book fair we had the teachers list books that they were interested in and books that they wanted in the library. The parents purchased a few hundred books and donated them to the library.

next topic on the library, use cases and user interface design. I will be using three different tools to design this library interface; Apps Express(HtmlDB), JDesigner, and NetBeans. The purpose of using three is to compare and contrast the three different tools for creating the same product. Once we have the design completed, we will look at what it takes to import our spreadsheet data into a database. The databases that we will evaluate are OracleXE, mySQL, and SQL Server Express. We will look at these three because they are all free databases.

some quick linux tips and tricks

in playing with our new library system, I got distracted by questions from customers. This did side track me as well as attending the Spatial Users Group in San Antonio. Spatial is an interesting technology that looks like it will be fun to play with. Unfortunately, I don’t see how to integrate it into a library checkout demo since we are just starting with one library.

A couple of simple notes for myself…..

to list Oracle packages on a Linux box………. rpm -qa | grep -i oracle
to remove a package on a Linux box ………… rpm -e oracle-xe-10.2.0.1-1.0

these are simple examples on how to find and remove the XE version of the database so that we can add more.

to mount a usb disk on my windows laptop/desktop

1) share the disk from windows
2) on the hosted linux box, make sure I can see the ipaddress of my laptop. This might require to reconfigure the network and edit host entries.
3) mkdir /mnt/f
4) mount -t smbfs //my-laptop/f /mnt/f

This allows me to look at the CD and DVD images of our software, the Linux ISOs, and tutorials/documents associated with the installations.

When removing software from a vmware image, the size of the disk and disk partitions that have been created and populated does not shrink. For example, if we install 2G of software on top of the 1.5G OS installation, the vmware image will be 3.5G in size. If we delete the 2G of software, the vmware image does not shrink.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31