December 2007 – Data Protection Down Pat

learning at it’s highest level

Today my oldest son is graduating from college. I have talked with him in the last few weeks on some of the things that he has been learning since he is getting a degree in Chemestry and Chemical Engineering. He said that most of his classes are team projects or joint group reports. It is interesting that universties have moved from teaching an individual to teaching someone to be part of a team. We had some hands on seminars yesterday and the day before and I noticed that the labs were all done in pairs. I asked the instructor why and he said it was a requirement because they had 20 students and only 10 lab machines.

This brings me back to my question of learning. Is teamwork a necessary way of learning or is it an enabler to allow more people to get to scarce resources. I think I will try an experiment over the next few weeks. One of my counterparts is getting a new laptop and the two of us are going to play with OracleVM. I will see if we can work together on this and generate something more than each of us could generate on our own.

More details later. Now it is time to get ready for sitting in the arena for a couple of hours just to see the five seconds (ok 10 seconds because he is crossing twice) that I really care about. I care about the other 2000 people who are graduating but I just will remember the 10 seconds more than the rest of the two hours.

Windows and VMWare

I have been playing with VMWare and Windows and have noticed that some images tend to ask for license validation again and some do not. I have made sure that I have a valid license handy to play with but I might want to take an image and modify it for testing purposes. I might want to run the image on a different machine to test an IO system or test a new graphic interface. Even though I have a license, it some times prompts me for revalidation.

In VMWare there is a control file tagged with the .vmx extension. It saves data like the MAC address of the box or a uuid generated for the VM machine that created it. I have found that you can fool the Windows instance to think it is on the same machine by changing the following parameters.

Ethernet0.addressType = “generated”

Ethernet1.addressType = “generated”

uuid.location = “56 4d 60 58 ac 4d 2d 60-a4 81 7f ef a3 89 c5 ca”

uuid.bios = “56 4d 60 58 ac 4d 2d 60-a4 81 7f ef a3 89 c5 ca”

ethernet0.generatedAddress = “00:0c:29:89:c5:ca”

ethernet0.generatedAddressOffset = “0”

ethernet1.generatedAddress = “00:0c:29:89:c5:d4”

ethernet1.generatedAddressOffset = “10”

I am not sure if this adheres to the legality of the license but it does stop me from having to make the dreaded phone call to Redmond just to port my license to another laptop running in an VM image. It seems that I should be able to run the image on my laptop or desktop as long as I am not running it on both. I am typically not running it more than a few hours every once and a while for testing purposes. I should not spend more time on licenseing and phone calls that I do on testing. Because of this issue, I typically play and test on Linux because there are not the same licensing issues and restrictions. Yet another example of “easy to do business with” getting in the way of innovation.

Oracle VM discussion

I have been having internal discussions with a few groups about Oracle VM. I was surprised to find out that this technology has been used internally for more than a year. The two biggest users are the hosted data centers and Oracle Education. The data center has a policy to deploy images. The education group deploys the hands on seminars and classes using VM images to a server farm.

The product is free to download and use, support cost money. All of the Oracle product suite is supported on this product because engineering has been submitting code changes to the Xen source for a while and testing the product matrix for a while. Complete information can be found in Metalink in Note 464754.1

Paravirtualization means that the operating system knows that it is running in a VM environment. The os hides the fact and presents a standard interface to the applications. PVM is available for Linux with a modified kernel. This allows you to run this on older hardware that does not support hardware virtualization. It is supported in 64-bit and scales upto a 64 core system as well. Each VM can support upto 32 virtual CPUs to aggregate SMP or multi-core chips.

The Oracle VM Manager is currently an XE version of the database with an OC4J connector. The manager is currently only supported on Linux (next release will be Windows) and is manageable from a browser. We are currently building a suite of VMs that incorporate the Oracle stack of applicaitons. We are currently delivering a Linux image. Next will be the database and applicaiton server images pre-installed.

Pricing on the product is no license cost, no live migration cost, no management cost. Support cost is $499 for 2 processor per system per year or $999 for unlimited processors. In OnDemand we went from 6 dual processor boxes (2 prod, 2 dev, and 2 test) consuming 2010 wats at 100% load. If we consolidate this and move dev and test onto the same box we can reduce the box count to four and power consumption to 1180 watts. If we virtualized everything and run production middle tier and database on the same box but different VM instances we can consolidate this to one box and have the middle tier and database dev and test on the other box. We went from 6 systems that were mostly idle to two systems that support multi-tier applicaitons as well as dev and test consuming 660 watts of power. This allows us to reduce server footprint, heat consumption, and keep the management and administration the same.

According to IDC, virtualization is considered a mainstream technology with 22% servers virtualized today with 45% in 12 months. The market should tripple in the next four years with focus on disaster recovery, offsetting new data center construction, and power savings.

The big benefit for this technology is to simplify evaluation and deployment of software. Instead of installing and configurating the operating system and application, we can create a template and allow these templates to be deployed and modified. The time to modify a template is significantly less than the time required to deploy the entire solution. This is a significant potential for partners and ISVs to sell an entire configured solution and not a product that needs consulting to configure and install.

sorry for the non-technical and somewhat repeat of information. These are notes from a webcast that happened yesterday. I figured that the next step would be to go to a vendor that is selling this product and listen to one of their presentations. I also purchased two books. The first is http://www.amazon.com/Definitive-Hypervisor-Prentice-Software-Development/dp/013234971X/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1197562056&sr=1-1
which is the Definitive Guide to the Xen Hypervisor by David Chisnall. The second is http://www.amazon.com/Definitive-Hypervisor-Prentice-Software-Development/dp/013234971X/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1197562056&sr=1-1
which is Virtualization with Xen by David Williams. I have ordered these books but have not received them yet.

different ways of learning

The other day I posed a journal of my attempt to learn about OracleVM. Since then I have started looking at different ways of learning ideas. A few years back I took some online classes from Stanford to learn new technologies in Computer Science and Electrical Engineering. Although I found these classes very informative, I also found that I could not afford to pay for these classes. One thing that I did learn was that reading text and watching a video really did not do it for me. The seminar classes were interesting but I found myself drifting if the subject did not interest me. When there was a project or lab assigned to the class, I paid more attention and learned substantially more. I have been trying to apply this to my everyday job and get people to do something while I am presenting. For example, if I am talking about a new technology I have to start asking things like how do you currently do xyz? How long does it take? What happens if you get a promotion, can you give your code to the new guy and have them understand what you did? How difficult would it be to change your procedure if you change a major component? Questions like this typically keep people interested in the technologies instead of sleeping through a powerpoint.

Last night my family had a get together to welcome back my brother-in-law from a semester at sea. When I first heard that he was going to go on a cruise for 4 months, my first thought was “what a waste of a semester and money”. My background is engineering. If I can’t specifically measure how I am progressing towards my goal of graduating, I typically don’t do it. When I first read his blog about his trip, it looked like a wonderful vacation but very little education. After reading all of his blog and having seen his pictures from the different side trips that he took, I realized that he learned more about different cultures and governments than I realized was possible. He initially went out to look at different relgions and cultures because he wants to go into law for a profession. He wanted to see the world through a different set of eyes and understand what motivates people from different countries. What he ended up seeing was the abject poverty and class separation that is magnified in some of the poorest countries around the world.

His travels around the globe started me thinking about the importance of learning by observing. I realize that I have been learning by reading, watching training videos, playing with software. I have not visited sites that are using the technology and talked with people who have experience with this technology in production. I think that over the next few months I will try to visit our larger customers and see how they use technologies like RAC, Enterprise Manager, and Data Guard. I can’t remember the last time I saw what a DBA did outside a conference room. When I worked at Rice University we would typically go to a client’s office and see how they would do things like update a web server. Doing this exposed numerous security holes that we then had to develop a plan to correct.

I guess the emphasis of this blog is to voice the opinion that there are a variety of ways of learning and no single way is the best. The ways that I have traditionally used have enabled me to be successful but I don’t think that they will work to keep my abilities up to date. I am interested in hearing about different ways that others have used to learn new technologies. This blog is one way that I have found to be a new learning path. When other people read this, they sometime comment on a different way of looking at the problem or solving the issue differently.

label creating and scanning of books

Now that we have a way of recording books and people who check out books we need to figure out how t make this easier. Having someone type in the book title and student name at checkout time will be a time consuming process and lead to errors. If there are two sudents with the same name, there will be errors. If there are multiple books with the same name and author, this will cause errors. We need to come up with a way to streamline this process. The obvious thing is to look at what a library does and replicate this.

If you go to a local library, they have a bar code scanner at checkout and scan each book when you check out. The bar code does not need to be an industry standard but something unique that identifies the book. This can be easily done by using the book identifier provided by LibraryThing.

The first thing that we did was to look at scanners that are available. This was done by going to the Micro Center, Frys, and Radio Shack. The only scanner that was available commercially is the http://www.waspbarcode.com/ scanner. We purchased the hand scanner for $250 and made sure that it worked with our laptop. The specific scanner that we purchased was the http://www.waspbarcode.com/scanners/ccdlr_barcode_scanner.asp WLR8900 scanner. This worked very well in reading the ISBN bar codes that were on the back of most books.

The second thing that we wanted to do was to print out own labels since to books will resolve to the same book if we us the ISBN number on the book. To do this we purchased some software from a local office supply store. The software that we purchased was the Avanquest Label Maker 4.0 software for $25. Using this software we are able to import the data from LibraryThing and print out a label using a barcode on a standard Avery label.

The label that we wanted to use needed to contain a barcode that uniquely identifies the book. We also want to print out the book title, author name, the dewey decimal number, and the HSE Library. This is easy to do through the Design interface of Label Maker. We initially picked an Avery label format 8250 so that we can have 30 labels per sheet. This create a 2 5/8 inch by 1 inch label.

Our initial tests with the barcode did not work very well because the printer that we used were not of very good quality. We initially picked Code 128 format for the barcode layout. To make the printing easier we increaed the space width by 10% and increased the bar width by 10%. Once we did this we had very few scan failures. When we did a random sampling of 1000 labels, less than 2 per sample required multiple rescans to read the barcode. We used standard Times New Roman font to get the label and text on the label. Once we printed this label it was easily put on the last page of the book. This is probably the most labor intensive part of getting everything done but it does allow us to uniquely identify books.

Once we had the format properly created, we needed to import the book inventory ino the Label Maker software. This is easly done by defining the data table with some unique titles to make importing easier. If we define the columns as follows, the excel table that we try to import will come in easily without modification. The table looks like:

– author (last, first)

– author (first, last)

– Title

– ?book id

– ISBN

– Your Tags

– DCC

When we try to import the excel data sheet, we need to first save the Excel sheet as a comma separated file. In doing this we can use the File->Import option to bring in the csv file. When we do this we get data entries of the entire library.

On a weekly basis we import the library list and sort it by date entered into LibraryThing. When we import the csv file, we do not import the entire library list but only books that have been entered since the last label printing. This is done by truncating the csv file to include a list of books from the last week or two weeks and importing it into the Label Maker software.

This process has been very effective in tracking books and correlating them to users. The students are required to have a picture id that has their student id on the card. We can easily scan the book bar code a well as the student id which is also a bar code and correlate the two in a database or Excel spreadsheet.

The execution of printing labels is realtively easy. The difficult parts are setting up a printer on the library laptop because the IT department does not want to support printers or something that they did not install. They also do not want to support the Label Maker software since it was not purchased by them. Fortunately, these items are easy to setup but it does require having an administrator login and execute the setup scripts. Once this has been done the library system is relatively easy to use on a weekly basis to print labels as well as check in and check out books.

Up next, we will look at the logic required to implement a checkout system using a database as well as through application express.

apex – building a library system

Ok, time to get back to an old topic that I need to finish. My kids school needs a library system. We have built one using Excel and want to upgrade it to Application Express and Express Edition of the database.

The first problem that we need to look at is how to get data on books without having to pay for a database with all books that has a monthly update fee. Fortunately, I found a web site, http://www.librarything.com, that allows you to create on on-line library catalog system with a small donation. To date we have entered over 6000 books into the system. The data required to enter the book is an ISBN number which is typically printed on the book. For example, if we have the book “A Fly Went By” by Mike McClintock, we can get the ISBN number of 9780394800035 printed on the back cover. If we type this number into librarything, we get a field to enter Tags, a review, a rating, and a summary. The librarything fills in the Title, Author, Other Authors, Date, LD Call Number, Dewew Number, Plublisher, and languages. When we enter the data the date Acquired is auto-entered. This information has proved to be very valuable. From it we can create a card catalog and list of books in our library. We chose to use the Comments field to list the price of the book (needed if a student looses the book) and the Tags field (to list which age group can read this book).

The data entry of the book is simple. There is a tab on library thing to add books. It has a search area, tag to add, and place to search (Amazon.com or Library of Congress). If you enter the ISBN number in the search field the web site goes to the search location to populate the data for the book. This helps because it allows us to add old and new books without having to subscribe to a book depository that provides all of this data.

Once we have the data entry finished, we need to export this data so that we can use it in a Excel spreadsheet or database. To export this data we go to the tools tab and export the data as a tab-delimited text. We could export it as a CSV but this does not contain all of the data. The format of the data exported to a tab delimited spreadsheet is

book id

sequence number in librarything. This number is arbitrary but if there are multiple books with the same title and author the book id is different thus can be use to identify a book unqiuely.

title

title of book

author (last, first)
author (first, last)
publication

publishing company that released the book. This contains things like the address of the publisher as well as edition of the book if it has been released multiple times.

date

date of publication. If this is a new edition it has a different publication date than the previous edition.

ISBN
series

comment area to catagorize it as part of a collection. We do not use this field.

source

this identified where the information came from. Typically it is Amazon.com or the Library of Congress. It does not need to be filled in.

language 1

the primary language of the book

language 2
original language
LCC

Library of congress catalog index, we do not use this

Dewey Decimal system, we use this for non-fiction books

BCID

this field is not used

date entered

this is the date that the book was entered into the librarything system. We typically use this as an index for printing labels for the book.

date 2

this is used if the book information is modified. We typically don’t use this field

date 3

we don’t use this field

date 4

we don’t use this field

stars

we don’t use this field

your tags

this field is used to catagorize the type of books. Typical entried are non-fiction, fiction, everybody, smiley. These tags can be followed by any other tag that describes the information contained in the book. This information is very useful when trying to search for a book. For example, we might have a book about the history of England during World War II. In the tag you would find “non-fiction, history, Winston Churchill, World War II, England”.

review

this is intended to be a comment section to help catagorize the book. For example, if it contains violence or sexual situations we put the comments here. This allows us to figure out if a book should be high school only or good for middle school readers.

summary

not used

comments

this includes the price of the book. There was not really any
other place to put this data and this field is not shared with other
people in the librarything so it can be treated as a private field.

encoding

not used

When we download this page, we can load it into an Excel spreadsheet or import it into a database. Initially we loaded this data into an Excel workbook titled Books. We also load another workbook that contains a list of students. The student workbook contains the following data

student id, this is unique for all students, staff, and volunteers

Class

this is used to communicate to the teachers which kids have overdue books. The valid entries change each year and is provided by the school.

LastName
FirstName

Once we have the books and students we can create another worksheet that correlates books checked out and who has them checked out. The library workbook contains the following elements

date_out

this is a manual entry of when a book is checked out

book_id

this is the book id. It correlates to the bookid from the librarything. We print this number in barcode format so that the book can be scanned with a barcode reader

student_id

this is the ID from the student workbook.

date_in

typically this field is blank until the book is returned.

Book

we use this field to verify that the book id correlates to the book that the student is checking out. This field is populated when the book_id is entered. The value is filled in with the equation
if (b2=”” ,””, LOOKUP(B2,Books!$A$2:$A$10000,Books!$B$2:$B$10000))
It is important to note that this limits the number of books to 10,000 books in the library. If we exceed this number, this field needs to be changed to a larger number. This also assumes that the book id is located in column A and the book title is located in column B of the Books worksheet

Student

we use this field to verify that the student id correlates to the student that is checkign out the book. This field is polulated when the student_id is entered. The value is filled in with the equation
=IF(C2=””,””,LOOKUP(C2,Students!$A$1:$A$510,Students!$D$1:$D$510)&” “&LOOKUP(C2,Students!$A$1:$A$510,Students!$C$1:$C$510))
It is important to note that this limits the number of students to 510. This typically is not a problem but will need to change as the population of the student body changes.

Overdue?

we use this field to find books that have been checked out for longer than 14 days. If the book is overdue this field will have the contents “Over Due”. If the book is still out but not overdue it will contain “Checked Out”. If the book has been returned, this field will be empty.
=IF(A2=””,””,IF(D2<0.0001,IF(G2>13,”Over Due”,”Checked Out”),””))
It is important to note that there is a hidden column that is typically no seen. This field contains the number of days that a book has been checked out. =IF(A2=””,””,IF(D2<0.0001,TODAY()-A2,D2-A2))

Some notes on using Excel as a library system

If you resort the student workbook, you break the checking/checkout system
If you type something into the Book or Student field, it will override the calculated values and potentially lead to corruption
The student field is not protedcted so a volunteer can change a student name or class.
If someone corrupts the data and saves the spreadsheet, it becomes very difficult to recover from the corruption
if the book list has not been updated but the label has been put on a book, the last book in the spreadsheet will be listed as the book checked out. It will appear that a bunch of books have the same name but multiple ids. This isn’t true but can be fixed as long as no one tries to fix it manually. The solution is to import the new books into the Books workbook at the end and have the Library workbook reference these new values.
If someone changes the date_out values it can corrupt the overdue calculation. The same is true for the date_in.

In summary, using an Excel spreadsheet works. You can create a library system using it but you can not give the students access to any of the data. You can not use an Excel system for a card catalog because it is difficult to lock and unlock values yet let students look up book or author names. The system is also vulnerable to corruption by volunteers that do not know how to use a computer or are uncomfortable with a computer. What we need is to create a web based system that does not allow for editing of values other than the student_id and book_id for checkout and date for checkin.

Total cost of this system is the cost of the librarything – $25, and a Windows desktop with Excel. This is a difficult price to estimate because the school provided this so there was effectively zero cost to the library. In reality this is the most expensive component since a laptop is about $500 and Excel is about $200. Fortunately, this was not a cost that we had to pay for.

Next topic, how to print labels for books and how to import the data that we have in Excel spreadsheets into a database.