February 2008 – Data Protection Down Pat

What is Web 2.0

These are my notes on a web” seminar that had one of the Oracle VPs and a professor from Harvard discussion discussion Web 2.0

Q: What is the difference between Web 2.0 and Enterprise 2.0.

A: They are both the same. We have web based communities to facilitate creativity, collaboration, and sharing. Enterprise 2.0 is inside the walls of a corporation, Web 2.0 is openly shared on the internet.

One of the trends that we are seeing is that technology remains the best lever for getting more value from employees making them more productive and sharing of ideas or procedures.

The key impact of Enterprise 2.0 is that it allows for innovation, transformation, and globalization. It becomes easier to find out who is working on what and exposes remote groups on new ways of looking at things and sharing information quicker. It allows us to keep tied to people that we interact with on an infrequent basis. The concept is the “strength of weak ties”. These relationships are ways of getting a net effect on information.

There is a rumor that information gets corrupted in a wiki (aka Wikipedia). Since Enterprise 2.0 actually logs changes and who makes the change, there is very little mis-information and more good information tends to get out. This also allows the corporate vision or corporate direction to come to the top and be expressed in a wiki or a blog.

Mashups are becoming more and more popular. It allows information to be shared with context. For example, if you have a training scheduled at a location, you can recommend hotels, required training material, a map of the facilities, and the product page. This allows people to come to the training prepared and on time.

Different class of users can be defined. This is easy to manage and maintain. You can allow people to contribute, edit, and view information. There are always different pools of informaiton and corporate security will allow or disallow access to this data.

One of the challenges is bridging the structured and unstructured information. Much of the unstructured information exists in email, IM, blogs, and wikis. Managed information exists in ERP, SCM, and eProcurement systems. How do you bridge between these systems? An example is setting a budget. Most of the discussion of a budget happens in an unstructured way but the final results end up in a structured system. The main problem is that the next year, the context and flow of the decision process is lost and not done in the same way. It makes it a little more difficult to improve the quality and processes if you don’t record and track this information. Typically this is done by sharing a word document and ending up with multiple versions of the document and does not show who changed something or why they changed it.

The early adopters that move into this space are high tech companies that work in dynamic environments. They also have a younger workforce who are not affraid to play with new technologies. There are some financial services, professional services firms, and some brick and mortar companies. These projects start with a grass roots project that has a flexible leadership that is willing to try new approaches. An example would be an HR director trying to create an employee handbook. They might recommend a wiki and have different groups or departments create a wiki to start the process.

Some examples of companies doing this:

US Government – intelligence community. It allows all of the agencies to share information and secure who sees what.

How to get started? Look at some of the external sites that exist. flickr, facebook, second life, reddit, wikipedia, digg. Look at www.web2journal.com/read/413654.htm to explore the foundation/framework behind these services. Look at Oracle 11g Technical Preview 3.

old dogs (DBAs) and new tricks (Enterprise Manager)

I related a story to a customer today that surprised them. At OpenWorld there was a contest between a newly graduated DBA and a 15-year experienced DBA. The problem was a performance issue with a database that was not trivial to find. The experienced DBA started executing scripts and launching querries from the command line. The new DBA used OEM and got to the same result in just slighly less time than the experienced DBA. It wasn’t the end result that they expected. They thought I was going to say that OEM would be better, faster, and could slice bread. The message that I wanted to get across was that they could tier their support. They did not always need to involve the lead DBA in everything but could give the junior staff rights to do the mundane tasks. The senior DBAs could get more involved with the line of business and make their contribution more significant.

Surprisingly, this spawned a discussion that they don’t want junior DBAs when then spun into how difficult it would be to upgrade their databases from 8i and 9i to 10g. They have too many custom scripts and home grown notification systems. Why would they go to a vendor provided system? Well, I showed them the management interface for a host, the database, the applicaiton server, and PeopleSoft. Going with a vendor provided system meant that they did not have to write customer scripts for the new database. It also meant that they did not need to involve an app server admin or a PeopleSoft admin. They did not need to learn the PeopleTools management interface to be able to see what is wrong with their installation. In the end, they got it. They understood that they are not senior in all aspects of the applications that they support. They also understood that a vendor provided system allows you to look at stuff and not be an expert.

I am surprised that a theme this month always comes back to build vs buy. The cost of the scripts that they wrote not only improved the quality of a product that they purchased (a database) but it locked them into a specific version. They don’t want to upgrade from 9.2.0.5 to 9.2.0.8 because a couple of their scripts broke when they upgraded on a test system. It was better for them to remain on an unsupported version than it was to change their scripts. It was also easier to not upgrade because it would be too much testing and overtime hours. I understand keeping cost at a minimum but at what level does this cost bubble up? If something breaks or a new vulnerability comes out, they will eventually need to upgrade. If they do not upgrade, they risk the system coming down or being comprimised.

In my opinion, when the cost of upgrading to the latest patch involves testing of customized scripts and extensions, you need to start questioning the value of the customizations. There is a difference between job security and stagnation but it is a fine line at some locations.

what is the best way to build community

I have been working with a bunch of different groups on rolling out different projects. The issue of collaboration and communication is a constant issue. The common question comes up, how do I make sure everyone understands what is going on and reduce the time spent in making xyz work. We had an internal discussion about this yesterday and I was tasked with coming up with a collaboration strategy for my team.

If we start with information sharing, it typically starts with conversations between two or more people. When the group gets larger conversations become more and more difficult. This works well for small groups of 10 or less people. It is relatively easy even if the group is dispersed across multiple cities. Time zone issues can come into play if the distances are across multiple countries but this is typically not the problem.

Once everyone agrees on topics and action items, this information typically needs to be written down and shared. Email and text documents are the typical way of sharing this information. Things like shared files, network stores, or shared home directories are typically the answer.

The big problem comes up when you want contributors that are diverse, don’t know each other, and typically don’t get together for regular meetings. Something like this needs a web page, forum, discussion group, or wiki to maintain and share information. Blog servers can also be used but these are typically used for individual contributions and not group communications.

Historically, web pages have been used to share information in a publish and subscribe methodology. This is good for distributing information but is typically a push model. One or multiple people create content and others read this information. A newspaper is a primary example of this model. You have a group of writers that create news articles and a large community that read the information.

Recently, most newspapers have added the option of a discussion to supplement the articles. The discussion allows the readers to annotate or augment the articles. This is a good step because it involves more people in the creation of information and quality of the information presented. If, for example, a reporter states that something happened at 2pm but it really happened at 3pm, the readers can post a correction below the article.

The key drawback to this option is that you still have a handfull of content creators and not true collaboration. If you want a large number of people to create content, something must be in place to create structure and either allow or not allow joint editing of information. The key difficulty of web pages for this is that ownership of a web page comes down to file ownership. Having two people editing a web page can be a problem because the changes need to be merged or the file locked for editing by one person.

Blogs can be used to resolve this problem because each person can edit content and the information shared. The other problem with blogs is that it does not have structure. The structure of a blog is linear by time and not organized by subject. If multiple people talk about the same topic, they must link to each other and create a linked thread of the idea. Unfortunately, this does not allow for organization of ideas in a structured way, it is organized by date. A good discussion of this can be found here

A Wiki server, on the other hand is a hybrid of a blog server as well as web page. A user can edit the content of a web page on the wiki server similar to what is done on a web server but the wiki server manages web page editing through the web interface. A wiki server also allows you to create a structure based on products or ideas. If, for example, you want to talk about hardware, you create a page discussing hardware. If you want to talk about disk storage, you create a sub page that can be found under hardware. If you want to talk about operating systems, you create a parallel page to the hardware because it is not a sub topic of hardware. A good discussion of wiki structure vs web pages can be found here

Unfortunately, there are a large number of wiki servers. Oracle uses Wetpaint for the external wiki. Ward Cunningham has a relatively comprehensive list of wiki engines. Some of these engines use files on the back end. Others use a database as the information repository.

It looks like my next project is to get a wiki configured and enabled for internal only use to discuss specific industry verticals.

OEM 10gR4 Diagnostic and Tuning Pack Deep Dive

These are my notes from a web training by the product team on Diagnostics and Tuning packs for OEM 10gR4.

new features in 10gR3

EBiz Suite

PeopleSoft

Seibel

SOA Management

Identity Management

Remedy connector

MOM connector

JBoss

IBM WebSphere MQ

new features in 10gR4

Data Masking

Tomcat

VMWare

PeopleSoft Helpdesk

Microsoft Exchange

Blue Lane

Check Point

Citrix

Dell

Egenera

Juniper

Microsoft

NetApp

Nortel

Onaro

Pillar Data

Radware

JBoss

IT Challenges

– Reduct cost of managing IT -> grid automation/OEM

– assess status -> configuration management

– transaction from reacting to proactively resolving issues -> app performance management

– improve alignment between business customers and IT staff -> SLM

Database Diagnostic Pack

features improved each release

wait events instrumentation (7.3)

statspack (8i)

YAPP methodology (8i)

enhanced time-wait model (10g)

AWR – automatic workload repository (10g)

ASH – active session history (10g)

ADDM – automatic database diagnostic monitor (10g)

AWR – performance statistics data warehouse. Consists of snapshots of ASH. Everything stored in SYSAUX tablespace.

ADDM – top down analysis using AWR snapshots. optimizes to reduce database time. real time results. pinpoints root cause and eliminates non-problem areas.

OEM 10g works on 9i. The summary database screen does not have any differences in OEM for 9i systems. When we dive down into a specific database and look at the Performance tab, we see similar data but the 9i does not have the ADDM notices. Everything else shows this. Both do show the same wait event issues.

SQL Tuning does show rich information in 9i like expensive full table scans or cartesion joins as well as unnecessary rewrites. This data is not as rich as the SQL Tuning advisor in 10g but it does give information that the DBA can resolve on their own.

Top SQL data is available in 9i throught he STATSPACK to look for duplicate SQL or blocking sessions. The tool in 10g is more informative and integrated with SQL Tuning Advisor.

Database Tuning Pack

Tuning pack does require the diagnostics pack. It does provide features like SQL Tuning Advisor and SQL Access Advisor.

SQL Tuning Advisor looks at statistics, SQL profiles, access path analysis, and SQL structure analysis. It then recommends a new SQL profile, a new index, a recommendation on modifying SQL statements, or gathering more stats.

SQL Profile is additional information based on previous execution of SQL statements. The profile optimizes the execution of the SQL statements. It can be used to schedule I/O or populate the buffer cache properly. It makes this persistent across shutdowns and can be applied to other databases that are running the same code. 10g has a new feature in SQL Tuning Sets that allows for capture of SQL workloads and tuning sets better. It also records creation of a tuning set from a variety of sources; cache, ADDM, etc.

Data masking is a new frature in 10gR4. It allows you to anonymize columns so that you can send copies to off-site developers, third parties, or internal developers. The format and layout stays the same but the contents are masked.

The demo emphasized the main screen pointing out the systems down, security policy issues (critical and warning) as well as alerts (critical and warnings). It does allow you to sort based on a specific type of target. The examples given were IBM and Microsoft applications. If you look at the database tab it allows you to look at the different database types, different versions, and if the systems are up or down. If you pick one database you go to the database home page and see all of the graphs as well as more detailed reports. The performance tab allows you to look at the historical utilization, active sessions, disk I/O, and instance throughput.

This training was relatively good for an introduction. Unfortunately, it did not go as deep as the TDS demos are capable of doing. I personally like to use something like the PetShop demo and show blocked processes, slow querries, and how you can fix these.

USB storage and Linux (OracleVM)

I wanted to connect a USB storage device to my VM Master which is running Oracle Enterprise Linux Release 4 but was having problems. The key problem that I was having was that the disk was never recognized. I tried looking at the kernel and it appears to have the module configured. To verify this I looked at

/lib/modules/2.6.9-55.0.0.0.2.EL/kernel/drivers/usb/storage

This directory contains usb-storage.ko which appears to be compiled for this kernel. Since we are running this kernel we should either have the module included as part of the operating system or loaded when a device is attached.

If I attach a device to the USB bus I should see a /dev/sda? listed as a recognizable device. I should also be able to see something in the dmesg console that a device was loaded on the usb bus. To verify this type

$ dmesg | grep -i sd

You should see

SCSI device sda: followed by some data on how big the disk is and how it is partitioned as well as

USB mass storage support registered

if you do an

$ ls -al /dev/sda*

you should see

brw-rw—- 1 root disk 8, 0 Feb 18 21:00 /dev/sda

brw-rw—- 1 root disk 8,1 Feb 18 21:00 /dev/sda1

brw-rw—- 1 root disk 8,2 Feb 18 21:00 /dev/sda2

This shows that a device was created corresponding to a USB storage device on Feb 18th at 21:00. The device has two partitions and should be available for use. When I first looked at the device using the fdisk command, I saw that it was partially formatted for Linux and partially formatted for Windows. I changed this with the fdisk command

$ fdisk /dev/sda

Command: p

Disk /dev/sda: 92.6 GB, …….

/dev/sda1 1 13 104391 83 Linux

/dev/sda2 14 11262 90357592+ 7 HPFS/NTFS

From this we want to change the device type to Linux for sda2. To do this we enter

Command: t

Partition Number (1-4): 2

Hex code: 83

Command: w

Command: q

This sequence changes /dev/sda2 from an NTFS ready system to a Linux system. We want to do this so that we can format the disk as ext3 or some other Linux file system.

# mkfs -t ext3 /dev/sda2

Causes the disk to be reformatted using the ext3 file system. We want to do this so that the disk can be mounted and used to hold templates and iso images from the vm master system. We can also attach this system to out vm server utility disk and mount it as /OVS/iso_images to free up the internal disk on our server. We can also share these images from the vm master via nfs and not have to copy the files locally to the vm server.

When I first tried doing this on an old Toshiba Tecra laptop, it did not work. The problem is that this system has two different USB connections, a type one and type two USB connector. The type one USB connector would not recognize the disk. I had to unplug and replug the usb cable into the other USB connector and it worked.

Once the mkfs command finished, I was able to mount the disk with

# mount /dev/sda2 /media/usbdisk

and verify that it mounted with

# df -k

With this you should see /dev/sda2 mounted on /media/usbdisk along with the other disks that are mounted.

I did do some simple testing to see how small the dom-0 kernel is when compared to a command line installation of Enterprise Linux Release 4 Update 5. In the VM Master there are 763 packages installed. In the OELR4 there are 309 packages installed. It also appears that the VM Master software is a superset of the core packages. The same is not true with VM Server and the OELR5 core packages. The VM Server has 363 packages and the OELR5 has 437 packages. A few of the core packages that are the same have different versions.

build vs buy

The question of building an application vs buying an application does not come up very often but when it does it is a difficult conversation to have. I would understand if the discussion were public domain vs commercial products. That is a discussion that I have on a regular basis. I got some training on Oracle Web Center the other day and kept comparing it to uPortal. Yes there are differences. Yes one is a commercial product and the other is public domain. Yes there is value in both. The argument then comes down to dollars and training. When I was at Texas A&M I deployed a prototype of uPortal. It was more of a political fight than it was a technical challenge. We also deployed the Yale CAS server for single sign on. It was a relatively easy solution and required very little political battles. It mainly required a mandate from the university that we would no longer ship out password files and would restrict who could connect to the LDAP and Kerberos servers. This was an easy one. It increased security of all services on campus while increasing security of existing services.

I was at a customer the other day and they were talking about writing their own hardening solution for identity. They wanted to write a solution that presents a custom image or challenge word embedded in the html to prevent a man in the middle attack. This technology is used by many of the larger banks because it has been mandated for financial data. They want to use the technology for human resources data. It makes sense because they need to protect social security numbers.

What didn’t make sense was that they wanted to build their own solution for this rather than purchase one that already exists. The technology isn’t complex. It does require some java or asp code, a database, and a way of injecting the image into the authentication screen. This is effectively what CAS does without the custom images. It would be a simple step to change CAS to support the changing images or pass phrases but challenging to present a floating keypad or keyboard. Oracle provides this with the Adaptive Authentication Manager. This product provides the floating keyboard, challenge questions, and custom images as well as a risk analysis tool. I don’t want to get into the detail of the product because you can find it yourself.

My question is how do you justify building something or buying something. If the product will cost you on the order of $100K (which I have no clue how much it does cost). How many programmers does this translate to and how much support cost is required to reproduce something like it. If we look at a parallel, if a car cost $50K, how many mechanics would it take to get a car from the junk yard and build you a new one or build one from scratch from a kit. When was the last time you saw a kit car or kit airplane? I see a bunch of custom homes and spec homes being built but the vast majority are as is with customizations. I think software is similar to this.

In doing some research on the cost of software and how much a developer can produce on a daily basis, the numbers are difficult to pin down. They range from $20-$100 per line of code to 15-40 lines of code generated per day. If we look at the CAS code, it has about 50K lines of code. This suggests that to develop this software it would cost $1M conservatively and take about a thousand days. You can parallelize this and assign three or four people to this and reduce it to 250 days. This says that in a year you could re-write the CAS code from scratch and come out with a production quality supported package. Alternatively, you could spend $50K and assign a full time staff person for a year to test, implement, integrate, and deploy this system into your production environment.

It makes sense to me that buying is the way to go. Unfortunately, I am on the vendor side and am having trouble seeing the value in building my own software, or car, or computer, or phone system, or bicycle from scrap parts. I guess I have been away from the university too long…..