What the heck is Exadata storage

At OpenWorld last week, Oracle and HP announced an HP Oracle database machine and Exadata storage server hardware. product page.

If you read the news article, which has a broken link from the Oracle home page, you get the basics that HP and Oracle have jointly announced a new rack solution that is pre-configured to run the Oracle 11g database in a real application cluster (RAC), using 64 processing cores, Oracle Enterprise Linux, and upto 168 Terabytes of storage. The servers are HP Proliant D180 servers with two Intel 2.66 Ghz quad-core processors, capacity for 12 three and a half inch disks, 8G of memory, and dual InfiniBand network connections. I recommend reading more in the data sheet. What is new from Oracle is the Exadata Storage Server Software to manage this storage. Along with this is a new plug-in for Enterprise Manager to help manage this storage device.

The difference between this and standard storage is that the storage can either be raw storage or can be used as an offload storage device to minimize queries. It is typically used for a data warehouse where queries return large sets of data. With the exadata storage software, the storage processors can pre-process the SQL request and return the results to the database engine. It is important to note that this software is designed to be used with the HP hardware and only the HP hardware. It is not designed to be used with other servers or storage solutions. This product is directly targeted at the Netezza and Teradata solutions. The intent is to bring some of these old solutions to 10g and beyond and help with performance tuning. Most of the Netezza and Teradata solutions are on 9i or older and not tuned very well. A high end comparison of the Netezza, Teradata, and HP-Oracle solution shows that they both hold the same amount of disk but the HP-Oracle solution has substantially more database cores (4 for Netezza, 32 for Teradata, and 64 for HP-Oracle). The HP-Oracle solution also has 3x the memory and 20x the network speeds. Many of the Netezza customers are running into scalability problems where they need multiple boxes to host the data and are running into one source of truth issues because some data is replicated on both boxes and there are no consistency tools to keep the data synchronized.

You can scale the solution from one box to 18 boxes inside of one physical hardware rack. This gives you 65 Terabyes of storage using SAS storage or 216 Terabytes using SATA storage. The peak performance inside the box is 18 GB/second for data transfer. You can also use Infiniband to connect to another hardware rack to scale across multiple servers since each system has 2 Infiniband switches. The database engine comes pre-installed but is not configured to use RAC, partitioning, advanced compression, security, or enterprise manager options like diagnostic and tuning. You can run this system in single instance without RAC and split the data appropriately for a data warehouse. The single instance is configured to have a database instance that is managed by ASM. ASM then knows how to talk to the Oracle exadata cell manager and offload the queries to the exadata cell. You can have multiple exadata cells behind the Infiniband switch.

The 11g database has been enhanced to take advantage of the exadata storage. A new component called iDB (Intelligent Database protocol) runs in the database kernel and maps database operations to the exadata processors or launches the query locally. This protocol allows the exadata processors to perform functions like table scans, compression, and OLAP functions. The iDB protocol is built on top of RDSv3 (Reliable Datagram Sockets) and runs over InfiniBand. All of these protocols are open standard and publicly defined.

Two important points about this architecture is that when a SELECT…WHERE is issued against a table, a traditional storage device will return the table and the database engine will sort the where clause and drop the results that are not relevant. The exadata storage will do the select and the sort at the disk and only return the results after the where clause is performed. The second way of reducing the data transferred is to return only the columns requested and not the whole table. In a traditional storage device the table is returned and the columns that are not requested are dropped. In the exadata storage, the storage processor will perform the sort and only return the requested information across the data channel.

Another new feature of the exadata storage node is the ability to manage I/O resources. In traditional database management, you have been able to allocate CPU and memory utilization based on jobs or processes. You can allocate 30% of your processing to finance and 70% of your processing to payroll. The problem with this model is that a finance job could swamp the I/O resources with a simple request even though it does not use more than 30% of the processor. With the exadata storage you can allocate resources to the I/O channel as well. With this model you get 30% of your I/O allocated to finance as well as 30% of your database engine.

One of the impacts that this might have on business is that a single user can be given 10% of a machine and 10% of the I/O bandwidth. This user can be given the ability to run ad-hoc queries into the data warehouse without consuming more than 10% of all resources. This isn’t true with most business intelligence engines. We typically see problems with Cognos or Business Objects systems in products. Database resources can be restricted to a user or group and not let them overload the storage system with open ended queries.

The sales model for this box is a little different. Oracle sells the product and HP delivers the hardware pre-configured to the customers. Oracle handles the first line of support and HP provides the 2nd tier along with break-fix and spares. This machine does not come with database licenses. This allows customers to use existing licenses and not have to purchase new licenses. The warranty for this machine is 3 years parts and labor.

For those of you that are graphical in nature, technical” overview, is a good pictorial review of what I have been talking about.

My gut feeling… this is a new technology. It is a game changer. The big question is will this become mainstream for many companies or will it just be an Amazon and Google solution? If I were a hosting company and wanted to host Oracle, this would be the solution. I would choose this over visualization any day. If I were a telco or vendor with a large number of customers that needed target marketing, it would be an easy choice. My biggest problem is that in Houston, this isn’t the case outside the oil and gas industry. It might be viable for health care. It might be viable for utility companies. It might be viable for service companies. Time will tell.

Cloud computing

Great, just what we need. Another term that is a marketing buzz saw. Unfortunately, this one makes sense. Cloud Computing. First there was parallel processing. Then there was grid computing which is a very confusing term.

Oracle and Amazon announced yesterday see” also

What does this really mean? It means that you can have a static ip address allocated to some processors and memory hosted by Amazon using preconfigured images created by Oracle. It also means that you can request a higher level of service as needed. More processor, more storage, and more memory by requesting a new service or expanded service. Oracle has agreed to shorter terms for licensing if you are testing or trying out services. Licensing can go from one year to five years if you choose to lease the service with a one year price coming in at 20% of the list price.

The cost is the part that appears to be the interesting part. The cost of a processing unit is $0.10 for a small instance upto $0.80 for a large instance per hour. This ranges from $72/month ($864/year) to $576/month ($6912/year) to cover the cost of the processor and memory. If you want to back up your database into persistent storage you will need to pay $253/month ($3042/year). What this means is that for a small startup, the cost of deploying an Oracle database will work out to be

– $864/year hardware cost
– $1500/year storage cost (800G of storage for OS, Database, and data)
– $3500/year for the database (one year lease Standard Edition)
—————
$5864/year

This is pretty amazing considering the cost of configuring the same system to run Windows and SQL Server or Linux and MySQL. You truly don’t need a DBA or Unix admin full time on these systems so you can assume that you can outsource these at $1K/day for roughly one day a month. This brings the grand total to run a system to $30K with $24K of that being people to support the system. At this point the labor cost is roughly 4x that of the hardware and software acquisition and management cost.

For a small company this is significant. This is roughly what many spend on advertising and can provide an alternative way of communicating with customers. Think of a doctors office or a small law firm. Think of a car dealership or a small chain or restaurants. It allows the businesses to create a one on one relationship at all times of the day with their customers.

Interesting announcement and great possibilities.

more later…..

business intellegence symposium

there is a percieved link between business data and firms performance

Restrictive access to data to exec level only

Lack of trust of data to lower level employees

Small minority of firms use business data

Too hard to use tools

33% of tech companies think they adequately use business data
Financial firms come in at 20%

Silos, lack of standards, and error prone data impede use. Used more in other countries than US

Dirty data a big problem in Europe

Defining a strategy and admitting that data is good or bad is the first step

Data governance is also an issue for moving forward