more on HP-Oracle Exadata Storage

It appears that there is more than hype with the HP-Oracle Exadata Storage solution. I like the architecture and way that it splits out queries between boxes. A good source of more information can be found at Kevin Closson’s blog. Kevin is an Oracle Employee and is a performance architect. He goes into some good detail about the operations.

What impressed me the most about this device is that is currently is focused on data warehouse solutions. It looks for select statements and farms these requests to the storage node. Since the storage node understands the table structure and query formatting, it only returns the relevant data. Instead of returning a 1G table and doing the where select on the database machine, the select…where is done on the storage node and the relevant columns and rows are returned. This means that things like returning a current billing statement for a specific cell phone will return the current bill and not all of the historical data. Partitioning does this but it does not reduce the number of columns returned. If you have a lot of data associated with a user in different columns, you can reduce the amount transferred from the storage by returning only the relevant information from the tables and not the entire row.

Interesting stuff. If we only had a few telco companies where I work…..

The huge benefit that I see from this device is that the statements that are shipped across to the storage node is warehouse centric. I can foresee this working with spatial data and being applied to seismic reservoir systems. I can see this working with materialized views for manufacturing. The only thing that needs to change is the engine on the database that farms out the request to the storage component. This has huge long term potential where large data sets are used. Many of the complex models that are currently designed to reduce this complexity can be expanded and simplified. The storage node can be customized to understand this format and pre-process the requests. If we follow this to the extreme, you might be able to embed some of the stored procedures for things like velocity correction of seismic data and have the storage node process this data and present the corrected trace instead of the raw trace. You can also do thing like image recognition or enhancement in the storage component and not at the higher levels. More complex stuff that currently requires parallel processing can be forced down into the storage and not to parallel processes. Agreed, the product is not here yet but I can see it going in that direction. Since Oracle typically provides open APIs to most of its products, I can see new ventures springing up based on this technology and specific solutions being created for specific industries.