What the heck is Exadata storage

At OpenWorld last week, Oracle and HP announced an HP Oracle database machine and Exadata storage server hardware. product page.

If you read the news article, which has a broken link from the Oracle home page, you get the basics that HP and Oracle have jointly announced a new rack solution that is pre-configured to run the Oracle 11g database in a real application cluster (RAC), using 64 processing cores, Oracle Enterprise Linux, and upto 168 Terabytes of storage. The servers are HP Proliant D180 servers with two Intel 2.66 Ghz quad-core processors, capacity for 12 three and a half inch disks, 8G of memory, and dual InfiniBand network connections. I recommend reading more in the data sheet. What is new from Oracle is the Exadata Storage Server Software to manage this storage. Along with this is a new plug-in for Enterprise Manager to help manage this storage device.

The difference between this and standard storage is that the storage can either be raw storage or can be used as an offload storage device to minimize queries. It is typically used for a data warehouse where queries return large sets of data. With the exadata storage software, the storage processors can pre-process the SQL request and return the results to the database engine. It is important to note that this software is designed to be used with the HP hardware and only the HP hardware. It is not designed to be used with other servers or storage solutions. This product is directly targeted at the Netezza and Teradata solutions. The intent is to bring some of these old solutions to 10g and beyond and help with performance tuning. Most of the Netezza and Teradata solutions are on 9i or older and not tuned very well. A high end comparison of the Netezza, Teradata, and HP-Oracle solution shows that they both hold the same amount of disk but the HP-Oracle solution has substantially more database cores (4 for Netezza, 32 for Teradata, and 64 for HP-Oracle). The HP-Oracle solution also has 3x the memory and 20x the network speeds. Many of the Netezza customers are running into scalability problems where they need multiple boxes to host the data and are running into one source of truth issues because some data is replicated on both boxes and there are no consistency tools to keep the data synchronized.

You can scale the solution from one box to 18 boxes inside of one physical hardware rack. This gives you 65 Terabyes of storage using SAS storage or 216 Terabytes using SATA storage. The peak performance inside the box is 18 GB/second for data transfer. You can also use Infiniband to connect to another hardware rack to scale across multiple servers since each system has 2 Infiniband switches. The database engine comes pre-installed but is not configured to use RAC, partitioning, advanced compression, security, or enterprise manager options like diagnostic and tuning. You can run this system in single instance without RAC and split the data appropriately for a data warehouse. The single instance is configured to have a database instance that is managed by ASM. ASM then knows how to talk to the Oracle exadata cell manager and offload the queries to the exadata cell. You can have multiple exadata cells behind the Infiniband switch.

The 11g database has been enhanced to take advantage of the exadata storage. A new component called iDB (Intelligent Database protocol) runs in the database kernel and maps database operations to the exadata processors or launches the query locally. This protocol allows the exadata processors to perform functions like table scans, compression, and OLAP functions. The iDB protocol is built on top of RDSv3 (Reliable Datagram Sockets) and runs over InfiniBand. All of these protocols are open standard and publicly defined.

Two important points about this architecture is that when a SELECT…WHERE is issued against a table, a traditional storage device will return the table and the database engine will sort the where clause and drop the results that are not relevant. The exadata storage will do the select and the sort at the disk and only return the results after the where clause is performed. The second way of reducing the data transferred is to return only the columns requested and not the whole table. In a traditional storage device the table is returned and the columns that are not requested are dropped. In the exadata storage, the storage processor will perform the sort and only return the requested information across the data channel.

Another new feature of the exadata storage node is the ability to manage I/O resources. In traditional database management, you have been able to allocate CPU and memory utilization based on jobs or processes. You can allocate 30% of your processing to finance and 70% of your processing to payroll. The problem with this model is that a finance job could swamp the I/O resources with a simple request even though it does not use more than 30% of the processor. With the exadata storage you can allocate resources to the I/O channel as well. With this model you get 30% of your I/O allocated to finance as well as 30% of your database engine.

One of the impacts that this might have on business is that a single user can be given 10% of a machine and 10% of the I/O bandwidth. This user can be given the ability to run ad-hoc queries into the data warehouse without consuming more than 10% of all resources. This isn’t true with most business intelligence engines. We typically see problems with Cognos or Business Objects systems in products. Database resources can be restricted to a user or group and not let them overload the storage system with open ended queries.

The sales model for this box is a little different. Oracle sells the product and HP delivers the hardware pre-configured to the customers. Oracle handles the first line of support and HP provides the 2nd tier along with break-fix and spares. This machine does not come with database licenses. This allows customers to use existing licenses and not have to purchase new licenses. The warranty for this machine is 3 years parts and labor.

For those of you that are graphical in nature, technical” overview, is a good pictorial review of what I have been talking about.

My gut feeling… this is a new technology. It is a game changer. The big question is will this become mainstream for many companies or will it just be an Amazon and Google solution? If I were a hosting company and wanted to host Oracle, this would be the solution. I would choose this over visualization any day. If I were a telco or vendor with a large number of customers that needed target marketing, it would be an easy choice. My biggest problem is that in Houston, this isn’t the case outside the oil and gas industry. It might be viable for health care. It might be viable for utility companies. It might be viable for service companies. Time will tell.

503 Service Unavailable

Service Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Additionally, a 503 Service Unavailable error was encountered while trying to use an ErrorDocument to handle the request.