11g data warehousing

internal training video for Data Warehousing and Business Intelligence
March 13, 2007

Willie Hardie
George Lumpkin
Hermann Beyer

Goal is to provide a single comprehensive platform for DW (data warehouse) and BI (business intelligence). Embed data quality and integration.

New features for 11g

  • manageability
    • partition enhancement
    • tuning
    • optimizers
    • parallel query
    • ASM updates
    • data compression
  • VLDB
  • Performance with cache
  • Data loading
  • New SQL commands
  • OLAP
  • Data mining

Result cache – caches results of queries, query blocks, or function calls. This cache is shared across statements and sessions on server. Significant speed up for read-only and read-mostly data.

Information appliance initiative – preconfigured data warehouse solutions from HP, IBM, Sun, and Dell.

OLAP – make OLAP a standard feature of every data warehouse. Allow any SQL application to leverage OLAP performance. Excellent query performance for ad-hoc queries. Optimized incremental refresh and aggregation. New feature is cube organized materialized views. Fast incremental update and aggregation. Transparent access by SQL based applications. Works with range of tools (OBIEE, BO, Cognos, Hyperion, MicroStrategy).

Data Mining – simplified development and deployment of models. Supermodels, general linear models, multivariate linear regression, logistic regression

iSCSI podcast series

I have been listening to a series of data center podcasts talking about computer technologies. I will try to apply these talks to Oracle technology. iSCSI is a simple concept. It is a way of doing storage across a TCP/IP network. External storage is a new way of doing storage for most IT staff. The typical way of attaching disks is to go through a cable attached between your computer and your disk drive. iSCSI is a simple way of moving this disk to a network and not having to pay for expensive disk controllers and network interconnect cards typically used for SAN storage.

People who deploy SAN use 70-80% utilization. People that use attached storage use 50% of the space that they have purchased. SANs are perceived to be expensive to deploy. The highest cost for this is the fibre channel controller, cabling, and storage controller for the disks. iSCSI allows you to use GigE or standard network connections. iSCSI is a protocol that defines the communication interconnect. The implementation changes from OS to OS but the way that it works is the same. It allows you to use your existing ethernet network to connect to SAN storage. It is not recommended to just place this on your network because latency and collisions cause severe problems. It is recommended that you reconfigure your network to have a virtual lan to isolate the iSCSI protocol between the servers and the storage.

It is important to note that performance and latency are very important to the performance of a database. You might want to deploy fibre channel into a production or test environment but use iSCSI for development and QA. You typically need 10 GigE to make iSCSI work at the same speed as attached storage.

An example of how to use iSCSI with Oracle can be found at http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html
This writeup by Jeffery Hunter appears to be the reference that everyone puts in their blogs or web sites.