Automating processes

Recently I have been working on my AWS Architect certification. Rather than just grinding through the training material and practice exams I thought I should actually build something and journal my process. I have done this internally for Commvault but wanted to do an external blog as well so that if I left Commvault I would have a copy of my notes.

The first step in building something is documenting the project including the goals, objectives, and components that will be needed. Initially I interviewed the person that I am building this project for and drew everything on paper (I know, old school). From there I transcribed it into an AWS architecture diagram using LucidChart.

The goal of the project is to take a zip file that is an aggregation of ECGs performed on student athletes and upload it into S3. Once the zip file is uploaded it kicks off a process that unzips the files and copies them to another folder inside the bucket. From here these files are copies to Dropbox and a notification is sent to one or more email boxes or text messages. From this notification a Cardiologist interprets the results and responds to the email or text. Once the response is received the interpreted files are transferred from Dropbox back into S3 and sorted according to the school that was screened. Students that were marked as low risk are stamped with a low risk label. Students that were marked needing a follow up or high risk are placed into a different folder for manual processing and a notification is sent via email and/or text requesting for manual intervention.

The first step in the process is taking a zip file that was uploaded into S3 and processing it. Fortunately we have the ability to launch processes when a file is uploaded into S3 with the Lambda functions. A good place to start learning about this is https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html where the tutorial talks about how to create a Lambda function and tie it to changes to a specific bucket. This also required creating an IAM role that allows a Lambda function to read and write an S3 bucket.

Step 1) Create IAM Role. This id done by going to the IAM console and selecting Roles at the left of the screen. We want to click on the Create role blue button near the middle of the screen.



We select the Lambda function and click Next:Permissions at the bottom of the screen.

To make things easy we select AmazonS3FullAccess. What we want is the ability to read and write objects in an S3 bucket. We need to type in S3 in the Filter policies then select the box next to AmazonS3FullAccess.

We skip the tags role and go to Review. Here we enter a Role name. We will call it gbgh-processing. Once we enter this information we click Create role at the bottom right of the screen.

At this point we have a role that allows our Lambda function to access S3 objects and manipulate them.

Step 2) The next step is to create a bucket that we will upload files into. This is done by going to the S3 console and creating a bucket. The bucket must be in the same zone that we create our Lambda function. In our example we will create a bucket called gbgh-test and put it in the US East region.

We click on the Create bucket blue button at the top left of the screen. The bucket name needs to be unique and we want the US East region. We don’t want to copy settings from other buckets and will configure the options on our own. We will be using gbgh-test as the bucket name.

We will go with the default options and clear all of the permissions as shown below. We want to open up public access for our bucket because a variety of people long term will be uploading files into the bucket. We can control access through other mechanisms at a later date.

From here we click Next and Create Bucket on the next screen.

We should see our bucket ready and available in the S3 bucket list.

Step 3) create a Lambda function that receives S3 object changes so that we can process the files and do something with it. To do this we go to the Lambda function console.

Notice that we have a few functions already defined. Some are Java. Some are Node.js. Some are Python. We will be creating a Python 2.7 binary so that we can use the boto3 library. This pre-defined library allows us to quickly and easily manipulate objects in a bucket and call other AWS services like email and queue services. To start we click on the orange Create function button at the top right of the screen.

We will Author from scratch our code since we have some simple code that we can work from. We will call the function gbgh-test and select Python 2.7 as the runtime. We will select the gbgh-processing role to give our function access to the S3 bucket that we created.

When we click on the Create function it drops us into the Designer tool for editing and testing our Lambda function. The first thing that we want to do is add an S3 trigger to link S3 objects to our function.

The trigger is found under the Designer – Add triggers list. Scroll down and click on S3. It will show that Configuration is required and we will need to scroll down to configure the trigger. From here we will select the bucket and Event type. We are looking for create events which indicates that someone uploaded a file for us to process. We select the gbgh-test bucket and stick with the All object create events.

After clicking on Add we nativate to the gbgh-test icon to see the Function code where we can edit the function and test code that we want. The default handler does not do what we want so we need to replace this code.

For our function we are going to start simple. We will handle a create event, pull the bucket name from the event handler, pull the file name that was uploaded, and print all of the items in the bucket for testing. The code starts with an import of the boto3 library. We create a handle into S3 with the boto3.resource(‘s3’) library call. The lambda_handler is launched when a file is created in our S3 bucket. From here we print some diagnostics that the handler was called then walk the object list in the bucket and print the object names. From here we return and terminate the Lambda function. Sample code for this can be found at lambda_function_s3_upload.js

The next step is to create a test event to simulate a file upload. We do this by clicking on the Select a test event pulldown and selecting Configure test events.

For the test event we need to simulate an ObjectCreated:Put call or an https PUT that causes a file to be uploaded to our S3 bucket. We need to define the arn for our bucket and bucket name as well as an object name (gbgh.zip) that we need to actually create in our gbgh-test bucket. We define the event name as gbghTest and paste in our simulated event call. The code for this can be found at testS3Put.json

The data at the bottom of the screen are just curly brackets to finish out the record definition. From here we click on Create (I had to scroll down to see it) to create our test function. At this point we can select gbghTest and click on the Test button at the top right of the screen. When we first tried this it failed with an error. We had to add two lines to import the json and urllib functions since we call them to process the keys or object names. Once we make these changes we see that the status came back Succeeded and we get an empty listing of gbgh-test if we scroll down in the Execution results screen.

We can upload our gbgh.zip file into our gbgh-test bucket and it should appear at the bottom of the Execution results screen.

To summarize, we have a Lambda function that gets launched when we upload a file into an S3 bucket. Currently the function just prints a directory of the bucket by listing all of the objects that it contains. We had to create an IAM policy so that our function can interact with S3. In the next blog post we will do some processing of this zip file and send some notifications once the zip file is processed.

preparing a desktop for PaaS

Before we can start looking at the different options of a database, we need to get a desktop ready to do database development. It sounds a little strange that we need to download software to get access to a cloud database. We could do everything from the command line but it is much simpler and easier if we can do this using desktop tools. The software that we are going to download and install are

  • To access cloud storage
    • Mozilla Firefox
    • RestClient extension for Firefox
    • Google Chrome
    • Postman extension for Chrome
    • CloudBerry for OpenStack
  • To access files in our instance
    • Putty
    • Filezilla
    • Cygwin
  • To access our database instance
    • SQL Developer
    • Microsoft Visual C++ libraries

The eventual target that we are looking to get to is

To do this we need to go to the Oracle Cloud Marketplace and look for the Windows 2012 Server instance. What we want to do is provision a Windows instance and use it as a remote desktop for connecting to the cloud. We could do this on our desktop but the benefit of using a Windows Server is that we can create more users and use this instance for a hands on workshop. We don’t need to have anyone load any virtual machines, fight the differences between Mac and Windows, or wait for the binaries to download and install. We can do most of this on a virtual machine in the cloud and just add and delete users for workshops. To provision the Windows server, we go to the cloud marketplace, select Infrastructure, and Compute. We can the search for Windows and get a bootable image to use as our foundation.



Once we agree to the legal terms we can select an instance to provision this into. The way it works is that we copy a bootable image into a cloud instance. We can then create compute instances from this bootable image and customize it to our liking. Once we agree to the terms the marketplace connects to the Oracle Cloud and uses your authentication credentials to connect to the instance. From this is gets a list of instances associated with this account, checks to see if you have agreed to terms of marketplace use for this instance by setting your profile settings for the instance. Once the bootable image is ready, a splash screen is presented stating that you are ready to provision a compute instance.




The screen dumps you into a compute creation wizard that walks you through the compute provisioning. Rather than going through that interface we decided to start from scratch and log into the instance and provision a compute engine from scratch. We first select the boot image from our private images, select the shape to boot, define the instance name, configure ssh connectivity as well as set the Admininstrator password (not shown). Once we get the confirmation screen it takes a few minutes to create the boot disk then boot the compute instance on this newly formatted disk.









We can check the progress by looking at the storage and compute instance. When everything is done we should see a public ip address for our instance. If we don’t see our instance it is either still building or we should see an error in the history. Unfortunately, the history is hidden and a failed provisioning is now shown unless you look for it by expanding the history.


Before we can connect to our instance with remote desktop, we need to define a security list to allow for rdp, associate this rule with our instance, and define the security rule for rdp and associate it with the security list and instance.





Once we have rdp enabled to our instance, we look up the public ip address and connect as the Administrator user with the password that we passed in with a json header at the bottom of the creation screen (not shown). When we log in we see the server tools splash screen pop up.


We want to create a secondary user, give this user admin rights as well as rights to remote desktop connect to the server. We might want to add more users not as admins but with remote desktop rights for hands on labs. We can add and delete users using this method and it refreshes the workshop for the next class.





At this point we can create a staging directory and install the software that we listed above. The only product that causes a problem with the install is the SQL Developer because it requires a Microsoft package that is not installed by default. We need to download the library and all of the packages that we downloaded are ready to install. I did not go through customization of the desktop or downloading the public and private keys used for the workshop. These are obvious steps using filezilla from a shared network storage on a server in the cloud. We downloaded Firefox and Chrome primarily because Internet Explorer does not support REST Api protocols and we will need a way to create and list storage containers. We could have skipped this installation and done everything through CloudBerry but we can do everything similarly on a Mac (no need for putty or cygwin). With Firefox you need to install the REST Client api extension and Chrome requires the Postman Extension.










In summary, we created a compute Windows 2012 Server instance in the Oracle Compute IaaS. We added a new user as a backup to our Administrator user. We enabled remote desktop and configured a Mac to connect to this service remotely. We then downloaded a set of binaries to our Windows desktop to allow us to manage and manipulate storage containers and database instances. We also downloaded some utilities to help us use command line tools to access our database and customize our instances. We technically could do all of this with a Windows desktop, Internet Explorer, and SQL Developer. We went to the extra steps so that we can do the same from a Mac or Windows desktop using the same tools.

database options

Before we dive into features and functions of database as a service, we need to look at the options that you have with the Oracle Database. We have discussed the differences between Standard Edition and Enterprise Edition but we really have not talked about the database options. When we select a database in the Oracle Cloud we are given the choice of Enterprise Edition, High Performance Edition, and Extreme Performance Edition. Today we are going to dive into the different Editions and talk about the options that you get with each option. It is important to note that all of the options are extra cost options that are licensed on a per processor or per user basis. If you go with Amazon RDS, EC2, or Azure Compute you need to purchase these options to match your processor deployment.

One of the standard slides that I use to explain the differences in the editions is shown below.

The options are cumulative when you look at them. The Enterprise Edition, for example, comes with Transparent Data Encryption (TDE). TDE is also included in the High Performance and Extreme Performance Editions. We are going to pull the pricing for all of these options from the Technology Price List. Below is a list of the options.

  • Enterprise Edition
    • Transparent Data Encryption
  • High Performance Edition
    • Diagnostics
    • Tuning
    • Partitioning
    • Advanced Compression
    • Advanced Security
    • Data Guard
    • Label Security
    • Multitenant
    • Audit Vault
    • Database Vault
    • Real Application Testing
    • OLAP
    • Spatial and Graphics
  • Extreme Performance Edition
    • Active Data Guard
    • In Memory
    • Real Application Clusters (RAC)
    • RAC One

Transparent Data Encryption

TDE is a subset of the Advanced Security option. TDE stops would-be attackers from bypassing the database and reading sensitive information from storage by enforcing data-at-rest encryption in the database layer. Data is stored in the table extents encrypted and read into the database encrypted. The Oracle Wallet is needed to read the data back and perform operations on the data. Advanced Security and Security Inside Out are blogs to dive deeper into TDE features, functions, and tutorials. There is also a Community Security Discussion Forum. The Advanced Security option is priced at $300 per named user or $15,000 per processor. If we assume a four year amortization the cost of this option is $587.50 per month per processor. The database license is $1,860 per month per processor. This says that a dual core system on Amazon EC2, RDS, or Azure Compute running the Oracle database will cost you the cost of the server plus $2,448 per month. If we go with a t2.large on Amazon EC2 (2 vCPUs and 8 GB of RAM) and 128 GB of disk our charge is $128 per month. If we bump this up to an r3.large (2 vCPU, 15 GB of RAM) the price goes up to $173 per month. The cost will be $2,620 per month which compares to Enterprise Edition at $3,000 per month per processor for PaaS/DBaaS. We could also run this in Oracle IaaS Compute at $150 per month (2 vCPUs, 30 GB of RAM) to compare apples to apples. It is strongly recommended that any data that you put in the cloud be encrypted. Security is good in the cloud but encryption of data in storage is much better. When you replicate data or backup data it is copied in the format that it is stored in. If your data is clear text, your backups could be clear text thus exposing you to potential loss of data. Encrypting the data at rest is storage is a baseline for running database in the cloud.

Diagnostics

Diagnostics is a subset of the Database Management Packs that allows you to look into the database and figure out things like lock contention, what is holding up a wait queue, and what resources are being consumed by processes inside the database. Historic views into the automated workload repository (AWR) reports are available with this option. You can get spot options but not historical views and comparative analytics on AWR information. Some of the tools are free like compression advisor and partitioning advisor while others are part of the diagnostics pack. Diagnostics are licensed at $150 per named user or $7,500 per processor. This correlates to $294 per processor per month. Unfortunately, you can’t purchase Enterprise Edition DBaaS and add this but need to go with IaaS Compute and add this to the bring your own database license. The only way to get this feature is to go with the High Performance Edition. The binary that is installed on the cloud service specifically labels the database as Enterprise Edition, High Performance Edition, or Extreme Performance Edition. All of the features listed from here and below are prohibited from running on the Enterprise Edition when provisioned into the Oracle DBaaS. If you just want Diagnostics Pack on Enterprise Edition it does not make economic sense to purchase High Performance Edition at $4,000 per month per processor when you can do this on IaaS at $2,914 (the $2,620 from above plus $294).

Tuning

Tuning is also a subset of the Database Management Packs that allows you to look into sql queries, table layouts, and overall performance issues. Options like the SQL Tuning Advisor and Automatic SQL Tuning are part of this option. Tuning pack is $100 per named user or $5,000 per processor. This comes in at $196 per processor per month if purchased separately. A Tuning Whitepaper details some of the features and functions of the tuning pack if you want to learn more.

Partitioning

Partitioning is a way of improving performance of your database and backup by splitting how data is stored and read.
Partitioning is powerful functionality that allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity. Oracle provides a comprehensive range of partitioning schemes to address every business requirement. The key improvement is to reduce the amount of data that you are reading into memory on a query. For example, if you are looking for financial summary data for the last quarter, issuing a query into eight years of financial data should not need to read in 32 quarters of data but only data from the last quarter. If we partition the data on a monthly basis we only read in three partitions rather than all 32. Partitioning also allows us to compress older data to consume less storage while at rest. When we backup the database we don’t need to copy the older partitions that don’t change, only backup the partitions that have updated since our last backup. Partitioning is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. The three most purchased database options are diagnostics, tuning, and partitioning. The combined cost of these three options is $940 per processor per month. When we compare the $4,000 per processor per month of DBaaS to IaaS with these three options we are at parity.

Advanced Compression

Advanced Compression is a feature that allows you to compress data at rest (and in memory) so that it consumes less resources. Oracle Advanced Compression provides a comprehensive set of compression capabilities to help improve performance and reduce storage costs. It allows organizations to reduce their overall database storage footprint by enabling compression for all types of data: relational (table), unstructured (file), network, Data Guard Redo and backup data. Cost comparisons for this feature are directly comparable to storage costs. Advanced compression is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. Typical compression ratios are 3x to 10x compressions. This means that 1 TB of data will take up 600 GB or 100 GB at these compression ratios. Lower compression rates are recommended for data that lightly changes and high compression for data that will not change. The penalty for compression comes in when you update data that is compressed. The data must be uncompressed, the new data inserted, and recompressed.

Advanced Security

Advanced Security allows you to secure and encrypt data in the database. Advanced Security provides two important preventive controls to protect sensitive data at the source including transparent database encryption and on-the-fly redaction of display data. TDE stops would-be attackers from bypassing the database and reading sensitive information directly from storage by enforcing data-at-rest encryption in the database layer. Data Redaction complements TDE by reducing the risk of unauthorized data exposure in applications, redacting sensitive data before it leaves the database. Advanced Security is priced at $300 per named user and $15,000 per processor. The monthly cost will be $587.50 per month per processor for this option. Data redaction is typically required for replicating production data to development and test. If you have credit card, social security numbers, home addresses, or drivers license information in your database, redaction is important to have to remain Sarbanes Oxly and PCI compliant.

Data Guard

Data Guard is a key foundation piece of Maximum Availability Architecture and does not cost any additional money. You get data replication between two databases at no additional cost and data can be replicated as physical or logical replication between the database instances. This feature ensures high availability, data protection, and disaster recovery for enterprise data. Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive disasters and data corruptions. Data Guard maintains these standby databases as transactionally consistent copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage. Data Guard can be used with traditional backup, restoration, and cluster techniques to provide a high level of data protection and data availability. It is important to note that Data Guard is not allowed in Amazon RDS and you must use EC2 or another cloud service to use this feature.

Label Security

Label Security has the ability to control access based on data classification and to enforce traditional multi-level security (MLS) policies for government and defense applications. Oracle Label Security
benefits commercial organizations attempting to address numerous access control
challenges including those associated with database and application consolidation,
privacy laws and regulatory compliance requirements. When a user requests data, the database looks at the user credentials and roles that they have access to and filters the results that the user sees from a query. Label Security is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. Note that this is different than data redaction. With redaction, data is scrambled when data is copied. With Label Security, the data is not returned if the user does not have rights to read the data. An error is not returned from a query but a null value is returned if the user does not have rights to read a column. The biggest benefit to this option it does not require program changes to restrict access to data and present results to users. If, for example, we are going to show sales in a customer relationship program, we don’t need to change the code based on the user being a sales rep or sales manager. The sales manager can see all of the sales rep information to track how their team is performing. Each sales rep can see their data but not the other sales rep data. It is important to note that Label Security is not allowed in Amazon RDS and you must use EC2 or another cloud service to use this feature.

Multitenant

Multitenant or Pluggable Database allows you to consolidate instances onto one server and reduce your overall management cost. The many pluggable databases in a single multitenant container database share its memory and background processes. This enables consolidation of many more pluggable databases compared to the old architecture, offering similar benefits to schema-based consolidation but with none of the major application changes required by that approach. Backups are done at the parent layer. Users are provisioned at the pluggable layer. Features of the instance (RAC, DataGuard, etc) are inherent to the parent and adopted by the pluggable container. To take a test system from single instance to data guard replicated only requires unplugging the database from the single instance system and plugging it into a data guard system. The same is true for RAC and all other features. Multitenant is licensed at $350 per user or $17,500 per processor. This come in at $685 per processor per month. It is important to note that this option is not available on Amazon RDS. This option is specifically disabled and not allowed. You must run this on EC2 to use this functionality or on another cloud platform.

Audit Vault

Audit Vault and Database Firewall monitors Oracle and non-Oracle database traffic to detect and block threats, as well as improves compliance reporting by consolidating audit data from databases, operating systems, directories, and other sources. Audit vault is licensed at $6,000 per processor and is not available on a per user basis. This comes in at $235 per processor per month. This option typically requires a separate server for security reasons where logs and logging information is copied to prevent data to be manipulated on a single system and the auditing system.

Database Vault

Database Vault reduces the risk of insider and outsider threats and addresses common compliance requirements by preventing privileged users (DBA) from accessing sensitive application data,
preventing compromised privileged users accounts from being used to steal sensitive data or make unauthorized changes to databases and applications,
providing strong controls inside the database over who can do what and controls over when and how applications, data and databases can be accessed,
providing privilege analysis for all users and applications inside the database to help achieve least privilege model and make the databases and applications more secure. Database Vault is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. It is important to note that this option is not available on Amazon RDS. This option is specifically disabled and not allowed. You must run this on EC2 to use this functionality or on another cloud platform.

Real Application Testing

Real Application Testing helps you fully assess the effect of such system changes on real-world applications in test environments before deploying the change in production. Oracle Real Application Testing consists of two features, Database Replay and SQL Performance Analyzer. Together they enable enterprises to rapidly adopt new technologies that add value to the business while minimizing risk. Traces can be recorded for reads and writes and replayed on a test system. This makes the replay option perfect for development and testing instances. The product is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. It is important to note that not having the sys level access might or might not break this feature in Amazon RDS based on what you are trying to replay.

OLAP

Online Analytics Processing or OLAP is a multidimensional analytic engine embedded in Oracle Database 12c. Oracle OLAP cubes deliver sophisticated calculations using simple SQL queries – producing results with speed of thought response times. This outstanding query performance may be leveraged transparently when deploying OLAP cubes as materialized views – enhancing the performance of summary queries against detail relational tables. Because Oracle OLAP is embedded in Oracle Database 12c, it allows centralized management of data and business rules in a secure, scalable and enterprise-ready platform. OLAP is licensed at $460 per user or $23,000 per processor. This comes in at $901 per processor per month. This feature is good for BI Analytics packages and Data Warehouse systems.

Spatial and Graphics

Spatial and Graphics supports a full range of geospatial data and analytics for land management and GIS, mobile location services, sales territory management, transportation, LiDAR analysis and location-enabled Business Intelligence. The graph features include RDF graphs for applications ranging from semantic data integration to social network analysis to linked open data and network graphs used in transportation, utilities, energy and telcos and drive-time analysis for sales and marketing applications. This option is licensed at $350 per user or $17,500 per processor. This come in at $685 per processor per month. It is important to note that this option is not supported in Amazon RDS. You must select EC2 or another cloud service to get this option.

All of the above options are bundled into the High Performance Edition. If we add up all of the options we get a total of

  • Transparent Data Encryption – $587.50 per month
  • Diagnostics – $294 per month
  • Tuning – $196 per month
  • Partitioning – $450 per month
  • Advanced Compression – $450 per month
  • Advanced Security – $587.50 per month
  • Data Guard – bundled
  • Label Security – $450 per month
  • Multitenant – $685 per month
  • Audit Vault – $235 per month
  • Database Vault – $450 per month
  • Real Application Testing – $450 per month
  • OLAP – $901 per month
  • Spatial and Graphics – $685 per month

This roughly bubbles up to $5,833.50 per processor per month for the High Performance options. Oracle bundles all of this for an additional $1000 per processor per month. The Extreme Performance Edition options include Active Data Guard, In Memory, and RAC.

Active Data Guard

Active Data Guard has the same features and functions as Data Guard but allows the target database to be open for read/write and updates happen bidirectionally. Active Data Guard is licensed at $230 per user or $11,500 per processor. This come in at $450 per processor per month.

In Memory

In Memory optimizes both analytics and mixed workload OLTP, delivering outstanding performance for transactions while simultaneously supporting real-time analytics, business intelligence, and reports. Most DBAs optimize performance by creating indexes to find data quicker. This works if you know the questions ahead of time. If you don’t know the question it is difficult to tune for everything. In Memory allows you to create a row based copy of the data as well as a column based copy of the data for quick column sorts and searches. In Memory is licensed at $460 per user or $23,000 per processor. This come in at $901 per month per processor. The key advantage of this option is that it prevents you from purchasing a second database to do analytics and reporting on the same box as your transactional system.

Real Application Clusters (RAC)

RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide highly scalable and available database solutions for all your business applications. Oracle RAC is a key component of Oracle’s private cloud architecture. Oracle RAC support is included in the Oracle Database Standard Edition for higher levels of system uptime and a critical part of the MAA strategy. RAC is licensed at $460 per user or $23,000 per processor. This come in at $901 per month per processor. It is important to note that RAC is not supported in Amazon or Azure. The system requires shared storage between compute instances that neither platforms provide. The only option for this configuration is Oracle DBaaS/PaaS.

The options for Extreme performance come in at $2,252 per processor per month but Oracle only charges an extra $1000 with Extreme Edition.

In Summary, there are a ton of options for the database. You need to figure out what options you need and if you need more than a couple it is economically beneficial to go with High Performance. If you need RAC, Active Data Guard, or In Memory you must purchase the Extreme Performance Edition. It is also important to note that not all features are supported in Amazon RDS and you must either go with Oracle Database as a Service or build a system using IaaS. RAC is the only exception where it is only available with Oracle DBaaS. We will go into a couple of these features in upcoming days to look at the value, how to use, and what is required to make the functionality work with other cloud providers.

Oracle Database 12c SQL by Jason Price

Given that we have a database in Amazon RDS and Oracle PaaS we can go through some books from Oracle Press and see if anything breaks running through a book. Let’s start with something simple, Oracle Database 12c SQL by Jason Price, published by Oracle Press. This is an introductory book that goes through the basic data types, sql commands, and an introduction of XML at the end of the book. The material should be relatively straightforward and not have any issues or problems executing the sample code. The sample code can be downloaded from Oracle Press Books by searching for the book title and downloading the Chapter 1 sample code. This will give us a way to load a table with data and execute code against the table. We will use SQL Developer to execute the code from d:\workshops\sql books\SQL and see what works and what does not work.

To get started, we need to use our Amazon RDS instance and SQL Developer that we installed yesterday. We connect with the user oracle to port 1521 after opening up the port to anyone. From this connection we can execute sql code in the main part of the SQL Developer window and load the sample code to execute. We can test the connection with the following command

select sysdate from dual;


we can follow along the book and create the user store, load the schema into the database, and look at the examples throughout the book.




Everything worked on Amazon RDS. We were able to create users, grant them audit functionality, execute XML code, and generally do everything listed in the book. The audit did not report back as expected but this could have been a user error. According to the Amazon RDS Documentation audting should work. We might not have had something set properly to report back the right information.

In summary, the Amazon RDS is a good platform to learn how to program 12c SQL and the various user level commands. If you go through a book like Oracle Database 12c SQL everything should work. This or the Oracle PaaS equivalent make an excellent sandbox that you can use for a day or two and turn off minimizing your cost of experimenting.

resizing Amazon RDS Oracle EE

Yesterday we looked at connecting to DBaaS with the Oracle platform as a service option. We wanted to extend the database table size because we expect to grow the tablespace beyond current storage capacity. We are going to do the same thing today for Amazon RDS running an Oracle Enterprise Edition 12c instance. To start our journey, we need to create a database instance in Amazon. This is done by going to the Amazon AWS Console and click on RDS to launch a database instance. We then click on the Launch DB Instance button and click on the Oracle tab. We select Enterprise Edition then the Dev/Test for our example. We accept the defaults, define our database with ORACLE_SID=pri, username of oracle, no multi-AZ replication, and select the processor and memory size.




While the database is creating we need to change the default network configuration. By default port 1521 is open but open to an ip address range. We need to open this up to everyone so that we can connect to the database from our desktop instance. We are using a Windows 2012 instance in the Oracle IaaS cloud so the default mapping back to the desktop we used to create the database does not work. Note that since we do not have permission to connect via ssh connecting with a tunnel is not an option. We must open up port 1521 to the world and can not use a tunnel to connect these two instances. The only security option that we have is ip white listing, vpn, opening up port 1521 to the world. This is done by going into the security groups definition on the detail page of the instance. We change the default inbound rule from an ip address to anywhere.



We could have alternatively defined a security group that links the ip address of our IaaS instance as well as our desktop prior to creation of this database to keep security a little tighter. Once the database is finished creating we can connect to it. We get the connection string (DNS address) and open up SQL Developer. We create a new database connection using the sid of pri, username of oracle, and port 1521. Once we connect we can define the DBA view to allow us to manage parts of the database since we do not have access using Enterprise Manager.




It looks like the table space will autoextend into the available space, all we should have to do is extend the /rdsdata partition. This is done by modifying the RDS instance from the console. We change the storage from the 20 that we created to 40, turn on advanced monitoring (not necessary for this exercise), and check the apply immediately button. This reconfigures the database and extends the storage. Note the resize command that happens for us. This is a sysdba level command that is executed on our behalf since we do not have sys rights outside the console.





We can look at the new instance and see that the size has grown and we have more space to expand into.



We can see the changes from the monitoring console


In summary, we are able to easily scale storage in an Amazon RDS instance even though we do not have sys access to the system. We do need to use the AWS Console to make this happen and can not do this through Enterprise Manager because we can’t add the agent to the instance. It is important to note that some of the options that are available from the console and some are available with altered command line options that give you elevated admin privileges without giving you system access. Look at the new command structures and decide if forking your admin tools just to run on RDS is something worth doing or too much effort. These changes effectively lock you into running your Oracle database on Amazon RDS. For example to change the tablespace definition in Oracle you would typically type

alter database default tablespace users2;

but with Amazon RDS you would need to type

exec rdsadmin.rdsadmin_util.alter_default_tablespace('users2');

Is this a show stopper? It could be or it might be trivial. It could stop some pre-packaged applications from working in Amazon RDS and you are forced to go with EC2 and S3. The upgrade storage example that we just went through would be a manual process with the involvement of an operating system administrator thus incurring additional cost and time. Again, this blog is not about A is better than B. This blog is about showing things that are hidden and helping you decide which is better for you. Our recommendation is to play with Amazon RDS and Oracle PaaS and see which fits your needs best.

resizing database as a service with Oracle

Historically, what happens to a database months after deployment has always been an issue and problem. If we go out and purchase a computer and disk storage then deploy a database onto the server. If we oversize the hardware and storage, we wasted budget. If we undersize the hardware and storage we had to purchase a new computer or new storage and get an operating system expert to reconfigure everything on the new server and get a database administrator to reconfigure the database installation to run on the new server or new storage. For example, if we purchased a 1 TB disk drive and allocated it all to /u02 the database had a ton of space to grow into. We put the DATA area there and put the RECO area into /u03. Our database service suddenly grows wildly and we have a record number of transactions and increase the offerings in our product catalog and our tablespace suddenly grows to over 800 GB. Disk performance starts to suffer and we want to grow our 1 TB to 2 TB. To do this we have to shut down our database, shut down the operating system, attach the new disk, format and mount it as /u05, copy the data from /u02 to /u05, remount /u05 as /u02, and reboot the system. We could have backed up the database from /u02 and reformatted /u02 and /u05 as a logical volume to allow us to dynamically grow the disk and allow us to purchase a 1 TB for our /u05 disk rather than a 2 TB disk and reduce our cost. We successfully grew our tablespace by purchasing more hardware, involving an operating system admin, and our database administrator. We were only down for a day or half day while we copied all of our data and modified the disk layout.

Disk vendors attacked this problem early by offering network or fiber attached storage rather that direct attached storage. They allow you to add disks dynamically keeping you from having to go out and purchase new disks. You can attach your disk as a logical unit number and add spindles as desired. This now requires you to get a storage admin involved to update your storage layout and grow your logical unit space from 1 TB to 2 TB. You then need to get your operating system admin to grow the file system that is on your /u02 logical unit mount to allow your database admin to grow the tablespace beyond the 1 TB boundary. Yes, this solves the problem of having to bring down the server, touch the hardware, add new cables and spindles to the computer. It allows data centers to be remote and configurations to be done dynamically with remote management tools. It also addresses the issue of disk failures much easier and quicker by pushing the problem to the storage admin to monitor and fix single disk issues. It solves a problem but there are better ways today to address this issue.

With infrastructure as a service we hide these issues by treating storage in the cloud as dynamic storage. With Amazon we can provision our database in EC2 and storage in S3. If we need to grow our S3, we allocate more storage to our bucket and grow the file system in EC2. The database admin then needs to go in and grow the tablespace to fill the new storage area. We got rid of the need for a storage admin, reduced our storage cost, and eliminated a step in our process. We still need an operating system admin to grow the file system and a database admin to grow the tablespace. The same is true if we use Azure compute or Oracle IaaS.

Let’s go through how to attach and grow storage to a generic compute instance. We have a CentOS image running in IaaS on the Oracle Cloud. We can see that the instance has 9 GB allocated to it as the root operating system. We would like to add a 20 GB disk then grow the disk to 40 GB as a second test. At first we notice that our instance is provisioned and we the 9 GB disk labeled CentOS7 allocated to our instance as /dev/xvdb. We then create a root partition /dev/xvdb1, provision an operating system onto it using the xfs file system, and mount it as the root filesystem.


To add a 20 GB disk, we go into the Compute management screen, and create a new storage volume. This is easy because we just create a new volume and allocate 20 GB to it.



Given that this disk is relatively small, we don’t have to wait long and can then attach it to our CentOS7 instance by clicking on the hamburger menu to the right of our new 20 GB disk and attaching it to our CentOS7 instance.



It is important to note that we did not need to reboot the instance but suddenly the disk appears as /dev/xvdc. We can then partition the disk with fdisk, create a file system with mkfs, and mount the disk by creating a new /u02 mount point and mounting /dev/xvdc1 on /u02.



The real exercise here is to grow this 20 GB mounted disk to 40 GB. We can go into the Volume storage and Update the storage to a larger size. This is simple and does not require a reboot or much work. We go to the Storage console, Update the disk, grow it to 40 GB, and go back to the operating system and notice that our 20 GB disk is now 40 GB. We can create a new partition /dev/xvdc2 and allocate it to our storage.




Note that we selected poorly when we made our file system selection. We selected to lay out an ext3 file system onto our /dev/xvdc1 partition. We can’t grow the ext3 filesystem. We should have selected ext4. We did this on purpose to prove a point. The file selection is critical and if you make the wrong choice there is no turning back. The only way to correct this is to get a backup of our /u02 mount and restore it onto the ext4 newly formatted partition. We also made a second wrong choice of laying the file system directly on the raw partition. We really should have created a logical partition from this one disk and put the file system on the logical partition. This would allow us to take our new /dev/xvdc2, create a new physical partition, add the physical partition to our logical partition, and grow the ext4 file system. Again, we did this on purpose to prove a point. You need to plan on expansion when you first lay out a system. To solve this problem we need to unmount the /u02 disk, delete the /dev/xvdc1 and /dev/xvdc2 partitions, create a physical partition with logical volume manager, create a logical partition, and lay an ext4 file system onto this new volume. We then restore our data from the backup and can simply grow the partition much easier in the future. We are not going to go through these steps because the exercise is to show you that it is much easier with platform as a service and not how to do it on infrastructure as a service.

If we look at a database as a service disk layout we notice that we have /dev/xvdc1 as /u01 which represents the ORACLE_HOME, /dev/mapper/dataVolGroup-lvol0 as /u02 which represents the tablespace area for the database, /dev/mapper/fraVolGroup-lvol0 which represents the fast recovery area (where RMAN dumps backups), and /dev/mapper/redoVolGroup-lvol0 which represents the redo log area (where DataGuard dumps the transactions logs). The file systems are logical volumes and created by default for us. The file systems are ext4 which can be seen by looking at the /etc/fstab file. If we need to grow the /u02 partition we can do this by using the scale up option for the database. We can add 20 GB and extend the data partition or the fra partition. We also have the option of attaching the storage as /u05 and manually growing partitions as desired. It is important to note that scaling up the database does require a reboot and restart of the database. When we try to scale up this database instance we get a warning that there is a Java service that depends upon the database and it must be stopped before we can add the storage desired.





In summary, we can use IaaS to host a database. It does get rid of the need for a storage administrator. It does not get rid of the need for an operating system administrator. We still have to know the file system and operating system commands. If we use PaaS to host a database, we can add storage as a database administrator and not need to mess with the logical volume or file system commands. We can grow the file system and add table extents quickly and easily. If we undersize our storage, correcting for this mistake is much easier than it was years ago. We don’t need to overpurchase storage anymore because we can allocate it on demand and pay for the storage as we use it. We can easily remove one of the headaches that has been an issue for years and no longer need to triple our storage estimates and go with realistic estimates and control budget better and easier.

SQL Developer connection to DBaaS

Today we are going to connect to our database using SQL Developer. We could connect using sqlplus with a remote command but instead we are going to use a graphical tool to connect to our database in the cloud. It is important to note that this is the same tool that is used to connect to our on premise database. We can execute sql commands, look at the status of the database, clone pluggable databases from one service to another, and generally manipulate and manage the database with command line features of wizards.

SQL Developer is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. SQL Developer offers complete end-to-end development of your PL/SQL applications, a worksheet for running queries and scripts, a DBA console for managing the database, a reports interface, a complete data modeling solution, and a migration platform for moving your 3rd party databases to Oracle. There are a few books that have been written about this product.

as well as blogs

I suggest looking at the following

We are not going to dive deep into SQL Developer but rather introduce a couple of concepts for monitoring our database in the cloud. We are running version 4.1.3 on a Windows desktop. We actually are cheating a little bit and running it on a Windows 2012 Server that is provisioned into IaaS in the Oracle Cloud. It makes a good scratch space for demos and development hands on labs. When we connect we can connect to the public ip address of our database on port 1521 or we can create an ssh tunnel and connect to localhost on port 1521. We will first connect via an ssh tunnel. To start, we need to log into our database service and figure out what the ip address is for the system we provisioned. For our system we notice that the ip address is 129.152.150.120.



We are going to first connect with ip tunneling through putty. We launch putty and enter the ip address, the ssh keys, and open up port 1521 as a tunnel. We open a connection and all connections to port 1521 on localhost will be connected to our cloud service at the ip address specified. Note that this solution works if we have one database that we are connecting to. If we have two database instances in the cloud we will need to map a different port number on localhost to port 1521 or open up the ports to the internet which we will talk about later. We need to keep this shell active and open but we can iconify the window.


In SQL Developer we can now create a new connection to our database. This is done by clicking on the green plus sign in the top right of the screen. This opens a dialog window to define the connection to the database. We will call this connection prs12cHP which is the name of our service in the cloud. We are going to connect as sys so we need to select the advanced connection to connect as sysdba. It is important to note that you can not do this with Amazon RDS if you provision an Oracle database in the Amazon PaaS. Amazon does not allow you to login as sys or system and does not give you sysdba privileges. If you want sysdba access you will need to deploy Oracle into Amazon EC2 to get access. Once we define our connection to localhost, port 1521, sys as sysdba, and an OID of ORCL we can test our interface and accept the connection once it is successful. Note that we can execute commands in the right window and look at things like what version of the database we are running. In this example we are running the High Performance Edition so we can use diag and tuning extensions from SQL Developer.




There is a new DBA feature in the latest release of SQL Developer. We can launch a navigation menu to add our cloud database by going to the View … DBA option at the top of the screen. This give us another green plus sign so that we can add the database and expose typical management views adn functions. Two things that are of note here are a simple exposure to pluggable database as well as a clone option associated with this exposure.




We can do other things like look at backup jobs, look at table space allocation and location, look at users that are authorized and active. This is not a replacement for Enterprise Manager because it is looking at immediate and not historic data.

Now that we have connected through a tunnel, let’s look at another option. We can open up port 1521 on the database service and connect straight to the ip address. This method is not recommended because it opens up your database to all ip addresses on the internet if you are using a demo or evaluation account. You can whitelist ip addresses, vpn, or subnet limit the systems that it answers. This is done through the compute service management interface under the networking tab. We need to enable the dblistener for our database service. Once we do this we can connect SQL Developer to the database using the ip address of the database service. We might need to do this if we are connecting to multiple cloud servers and don’t want to create a tunnel for each of them.





In summary, we have connected to our database service using SQL Developer. This is the same tool that we use to connect to databases in our data center. We can connect the same way that we normally do via an ip address or tunnel to keep the server in the cloud a little more secure. We noted the differences between the Amazon RDS and Oracle DBaaS options and provided a workaround with EC2 or Azure Compute as an alternative. It is important to remember the differences between PaaS features and IaaS features when it comes time to calculating the cost of services. PaaS gives you expanded features like automated backup and size up/down which we will look at next week.

Using Enterprise Manager to manage cloud services

Yesterday we talked about the virtues of Enterprise Manager. To honest the type of monitoring tool is not important but the fact that you have one is. One of the virtues that VMWare touts of VSphere is that you can manage instances on your server as well as instances in VCloud. This is something worthy of playing with. The same tool for your on premise instances also managing your instances in the cloud has power. Unfortunately, VCloud allows you to allocate virtual machines and storage associated with it so you only have a IaaS option of compute only. You can’t allocate just storage. You can’t deploy a database server unless you have a database deployed that you want to clone. You need to start with an operating system and build from there. There are benefits of PaaS and SaaS that you will never see in the VCloud implementation.

Oracle Enterprise Manager provides the same universal management interface for on premise and in cloud services. Amazon falls short on this. First, they don’t have on premise instances so the tools that they have don’t monitor anything in your data center, only in their cloud. Microsoft has tools for monitoring services plugins for looking at Azure services. It is important to note that you need a gateway server in the Azure cloud to aggregate the data and ship the telemetry data back and report it in the monitoring tool. There is a good Blog detailing the cost if IaaS monitoring in Azure. The blog points out that the outbound data transfer for monitoring can cost up to $17/month/server so this is not something that comes for free.

Today we are going to look at using Enterprise Manager as a management tool for on premise systems, the Oracle Public Cloud, Amazon AWS, and Microsoft Azure. We are going to cheat a little and use a VirtualBox instance of Enterprise Manager 13c. We are not going to go through the installation process. The books and blogs that we referenced yesterday detail how to do this. Unfortunately, the VirtualBox instance is available from edelivery.oracle.com. We are not going to use this instance but are going to use an instance for demo purposes only available internal to Oracle. The key difference between the two systems is that the edelivery instance is 21 GB in size for download and expands to provide an OEM 13c instance for testing while the internal system (retriever.us.oracle.com) has a 12c and 11g database installed and is 39.5 GB (expanded to almost 90 GB when uncompressed). Given the size of the instance I really can’t provide external access to this instance. You can recreate this by downloading the edelivery system, installing an 11g database instance, installing a 12c database instance, and configuring OEM to include data from those instances to replicate the screen shots that we are including.



If we look at the details on the virtual box instance we notice that we need at least 2 cores and 10 GB of memory to run this instance. The system is unusable at 8 GB of RAM. We really should bump this up to 12 GB of RAM but given that it is for demo purposes and for training it is ok if it runs a little slow. If we were running this in production it is recommended to grow this to 4 cores and 16 GB of memory and also recommended that you not use a downloaded VirtualBox instance for production but install from scratch.

The key things that we are going to do are walk through what it takes to add a monitoring agent onto the service that we are trying to monitor and manage. If we look at the architecture of Enterprise Manager we notice that there are three key components; the Oracle Management Repository (OMR), the Oracle Management Service (OMS), and the Oracle Management Agent (OMA). The OMR is basically a database that keeps a history of all telemetry actions as well as reports and analytics for the systems being monitored. The OMS is the heart of Enterprise Manager and runs on a WebLogic server. The code is written in Java and presents the primary user interface to the administrators as well as being the gateway between the OMR and the agents or OMAs. The agents are installed on the target systems and collect operating system data, database data, weblogic data, and all other log data to ship back to the OMR for analysis by the users.

It is important to note at this point that most PaaS and SaaS providers do not allow you to install an Enterprise Manager Agent or any other management agent on their instances. They want to manage the services for you and force you to use their tools to manage their instance. SalesForce, for example, only gives you access to your customer relationship data. You can export your contact lists to an csv file to backup your data but you can’t correlate the contact list to the documents that you have shared with these users. Amazon RDS does not provide a file system access, system access to the database, or access to the operating system so that you can install the management agent. You must use their tools to monitor services provided on their sites. Unfortunately, this inhibits you from looking at important things like workload repository reports or sql tuning guides to see if something is running slow or waiting on a lock. Your only choice is to deploy the desired PaaS or SaaS as a manual or bundled install on IaaS forcing you to manually manage things like backups and patching on your own.

The first thing that we need to do in Enterprise Manager is to log in and click on the Setup button on the top right. We need to define named credentials since we are going to connect to the cloud service using public and private ssh keys. We need to follow the Security pull down to Named Credentials.

We click on the Create icon in the top left and add credentials with public and private keys. If we don’t have an ssh key to access the service we can generate an ssh key using ssh-keygen which generates a public and private key and upload the key using the SSH Access pull down in the hamburger menu. Once we upload the ssh key we can use ssh -i keyname.ppk opc@ip_address for our database server. We will use this keyname.ppk to connect with Enterprise Manager and have all telemetry traffic transferred via the ssh protocol.


Once we have the credentials valid in the cloud account we can create the ssh access through Enterprise Manager. To do this we to to Setup at the top right, Security, Named Credentials. We then click on the Create button in the middle left to start entering data about the credentials. The name in the the screen shot below failed because it begins with a number so we switched it to ssh2017 since 2017ssh failed the naming convention. We are trying to use host access via ssh which is done with pull down menu definitions. The system defaults to a host access but we need to change from host to global which does not tie our credentials to one ip address. We upload our public and private key as well as associate this with the opc user since that user has sudo rights. We can verify the credentials by looking at the bottom of the list. This should allow us to access our cloud host via ssh and deploy an agent to our cloud target.

Note that we created two credentials because we had a step fail later. We created credentials for the opc user and for the oracle user. The opc credentials are called ssh2017 as shown in the screen shots. The oracle credentials are called oracle2017 and are not shown. The same steps are used just the username is changed as well as the name of the credentials.






If we want to install the management agent onto our instance we need to know the ip address of the service that we are going to monitor as well as an account that can sudo to root or run elevated admin services. We go to the Enterprise Manager splash screen, login, select the Setup button in the top right and drill down to Add Target and Add Target Manually. This takes us to the Add Target screen where we can Install Agent on Host. To get rid of the warnings, we added our cloud target ip address to the /etc/hosts file and used a fully qualified and short name associated with the ip address. We probably did not add the right external dns name but it works with Enterprise Manager. When we add the host we use the fully qualified host name. We can find this by logging into the cloud target and looking at the /etc/hosts file on that server. This gives us the local ip address and a fully qualified host name. Once we have this we can enter a directory to upload the agent software to. We had to create an agent directory under the /u01/app/oracle directory. We select the oracle2017 credentials (the screen shots use ssh2017 but this generates an error later) we defined in the previous step and start uploading the agent software and configuring the host as a target.





Note that we could have entered the ip address rather than going through adding the ip address to /etc/hosts. We would have received a warning with the ip address.

When we first tried this we got an error during the initialization phase that opc did not own the /u01/app/oracle directory and had to create an agent directory and change ownership. Fortunately, we could easily resubmit and enter a new directory without having to reenter all of the other information. The deployment takes a while because Enterprise Manager needs to upload the agent binaries, extract, and install them. The process is updated with status so that you can see the progress and restart when errors happen. When we changed the ownership, the installation failed at a later step stating the opc did not have permission to add the agent to the inventory. We corrected this by installing as oracle and setting the /u01/app/oracle/agent directory to be owned by oracle.

When we commit the ip address or host name as well as the ssh credentials, we can track progress as the management server deploys the agent. We get to a point where we note that the oracle user does not have ssh capabilities and we will need to run some stuff manually from the opc account.



At this point we should have an enterprise manager connection to a cloud host. To get this working from my VirtualBox behind my AT&T Uverse wireless router I first had to configure a route on my broadband connection and set the ip address of the Enterprise Manager VirtualBox image to a static ip address. This allows the cloud instance to talk back to the OMS and store data in the OMR.



The next step is to discover the database instances. This is done by going through a guided discovery on the host that we just provisioned. It took a few minutes to sync up with the OMS but we could verify this with the emctl status agent command on the target host. We add the target manually using the guided discovery and select database services to look for on the target.




At this point we should have a database, listener, and host connected to our single pane of management glass. We should see a local database (em12c) and a cloud based database (prs12cHP). We can look at the host characteristics as well as dive into sql monitoring, database performance, and database management like backup and restore options or adding users to the repository. We could add a Java Cloud Service as well as link these two systems together and trace a web page request down to a sql read and look at what the longest latency component is. We can figure out if the network, java memory allocation, or databse disk is causing the slowest response. We can also look at sql tuning recommendations to get suggestions on changing our sql code or execution plans using the arw report and sql tuning utilities in Enterprise Manager.

In summary, we can connect to an on premise server as well as a cloud server. We can’t connect to an Amazon RDS instance because we don’t get file system level access to push a client to or a root user to change the agent permissions. We do get this with IaaS on Oracle, Compute servers on Azure, and EC2 on Amazon. We also get this with PaaS on Oracle and potentially event Force.com from SalesForce. No one give you this ability with SaaS. It is assumed that you will take the SaaS solution as is and not need to look under the covers. Having a single pane of glass for monitoring and provisioning services is important. The tool should do more than tell you how full a disk is or how much of a cpu is loaded or available. It should dive into the application and let you look at where bottlenecks are and help troubleshoot issues. We could spend weeks diving into Enterprise Manager and the different management packs but we are on a journey to look at PaaS options from Amazon, Microsoft, and Oracle.

Managing servers and instances in the cloud

Managing servers and instances has been an ongoing issue since the introduction of the first computer. Recently with the advent of virtualization the idea of a management console to control what processors are running what services and what storage is allocated to what operating system has gained popularity. Many people are familiar with VMWare VSphere where you get a view of processors. We get a view of a server and can see virtual images deployed on this server. We can see how well the resources (memory, cpu, and disk) are being utilized. We can allocate more or less resources since this is a dynamic allocation and make sure that we are not over allocating resources and wasting them or under allocating them and causing applications to run slower.

In this example we can see that we have two processors, 2 GB of memory, and just under 300 GB of disk on this computer. We have five virtual machines running on this computer and can dive into each operating system and look at what operating system is installed and how the limited resources are allocated and utilized. What we can’t see is what applications are installed and how the applications are running. For example, is the Windows Home Server 2011 running an Apache Web Server and how many hits did the web server get in the past four days? Monitoring tools beg the question of what are you monitoring. If you are managing limited resources and making sure that you have not over or under allocated services, tools like VSphere are excellent tools. Unfortunately, you will need other tools to dive into another tool. EMC, for example, has a storage manager that lets you look not only at a logical unit level but a controller and disk level. It understands VMWare and lets you look at how disks are related to virtualization engines and how they are consuming resources.

Again, this is a very good tool to look at how well a disk is performing, how well data is laid out across spindles, and how well your data network is being transmitted between disk and server. We can see hot spots. We can see disks that are over and under utilized. We can manage a scarce resource and make sure that it is properly utilized.

When we talk about monitoring we need to shift our thought process. Yes, it is important to manage compute, memory, and storage resources but it is also important to realize that these resources are commodities. If we run low, we get more. If we use too much we are wasting resources. We should be able to automate allocation of resources and size up or size down resources without manual monitoring. What we are really interested in is how well is our company running. If we are a university we might be interested in the latency of delivering online video classes. We might be interested in how many classes are being added to a student schedule during registration. If we are a ticket retailer we might be interested in how many tickets were requested and paid for on a minute by minute basis. Note that we are not talking about how well a disk drive is allocated or if we have enough processors allocated to a virtual machine, we are talking in term of business terms. We are looking at tying revenue generating services back to computer resources and trying to figure out what is causing a problem. In the online video classroom example, we might have our processors allocated properly, storage tuned to the last IOP, and memory allocated to buffer data and reduce disk reads. If we are on the same network as the athletic department and our basketball team made it to the elite eight during March madness and the athletic department live streams the game on the same network as our classroom servers our classes will be offline due to demand to watch the basketball game. Tools from EMC and VMware will show that everything is working fine and life is good. Meanwhile the help desk is getting calls from students off campus that can’t access their assignments during midterms and their Thursday class is not available. What we need is a monitoring system that can look at systems and incorporate more than just processor and disk. What we need is a tool that can look at systems and services and not just resources. We would like to look at the video distribution system and be able to dive into the disk, network, or processor and see what the bottleneck is and fix it quickly.

Oracle released a tool years ago called Enterprise Manager. The tool started out as a database monitoring tool that allowed you to dive into sql calls and figure out why it was taking longer than necessary. With acquisitions of companies like BEA and Sun Microsystems the tool expanded to look at how Java was performing inside a WebLogic server and how disk drives were performing that were serving up requests for the database and WebLogic server. Acquisitions of companies like JD Edwards and PeopleSoft drove the monitoring tools in the opposite direction and screens showing how many purchase orders were being processed on an hourly basis were suddenly available. You could look at what was the bottleneck in closing your books for the end of month reconciliation. Was it a manual process waiting on a report to drop into a directory or was it a sql statement that was taking minutes rather than seconds to complete? You could start looking at a process like purchase orders and dive into a database to see if a table was reaching storage limits as well as figure out that someone recently patched the database which caused an index to not look at a new column that was created and searches are now going against this column so select statements are doing a full table scan rather than using an index to report answers quicker. Adding more storage in this case will be a waste of time. Yes, we are running out of storage on a table but the real issue is we need to re-index the database or execute a new sql execution plan. Below is a screen shot of how well a database is performing with links to look at all the sub-components of the database.

Books have been written on Enterprise Manager. We are not going to cover everything in this blog to make you an expert on the subject.

There are also a number of blogs related to Enterprise Manager

This is a partial list of blogs returned by a Google search. I am sure I missed a few. Note that the list of books and blogs is not a short list. There are classes offered by Oracle University that you can take virtually or in a classroom (both cost money).

The way that Oracle Enterprise Manager is paid for is simple. The base system is free and you pay for the options that you want to use. Unfortunately, the Technology Price Guide is not very clear as to what is and is not Enterprise Manager and what is an option on the database. For example, on page 7, most of the management packs are listed. If you want diagnostics for the database you will need to license your database at $7,500 per processor and not Enterprise Manager. You can license at $150 per named user but the licensing metrics for your database need to match the licensing for you management pack. You could have a two processor license for production and a 25 named user license for development and testing so you will need to blend these licenses into Enterprise Manager with the management packs. Diagnostics is specifically confusing because you enable or disable this feature in Enterprise Manager and not in the database. The telemetry data is being collected for the database but the reporting on the results of the analysis is not being done in the database. You could turn on the reporting in Enterprise Manager without involving the DBA thus incurring an additional license fee that you had not paid for. There is no license key or email that is sent to Oracle saying that you enabled the license it is a simple checkbox in Enterprise Manager that says turn on diagnostic reporting. In recent versions a warning screen pops up telling you that this is not a free feature. In OEM 10g the feature was turned on by default and you had to turn it off. This has changed in recent releases. If you try to turn this feature on when connecting to an Enterprise Edition in the Oracle Public Cloud you will get a feature not available message. You need to go with High Performance or Extreme Performance edition of the database to get the diagnostics enabled.

There are also management packs for Oracle Applications and the pricing for these products can be found in the Oracle Applications Price List. You need to search for the word “packs” to find the price of the management packs in this list. You can get a list of all the management packs from the Oracle Tech Network page for Enterprise Manager

It is important to note that the Enterprise Manager that runs in your data center monitoring your servers and Oracle hardware and software products is the same tool that you can use to monitor and manage PaaS and IaaS resources in the Oracle Public Cloud. You can connect to the instance in the cloud using ssh and read the telemetry from the cloud instance as if it were installed on one of your servers. You can use extensions to the latest version of Enterprise Manager, 13c, to clone a pluggable database instance from your on site installation to a cloud instance.

You can also setup reporting and self service requests to have end users ask for a new service to be provisioned either on site or in the cloud. Below is a screen shot of how to do this for a database. We could do something similar for a WebLogic server, an Apache Web server, a PeopleSoft instance for dev/test, or any layer of the Oracle stack.

In summary, selection of a management tool is important. Tools are good to understand and properly use. At some point you need to step back and ask what is the questions that I need answers to. Am I diving too deep on trying to optimize something that is not that worth deep analysis? Could I automate this and not have to monitor it at all? If I run out of processing power does it make sense to automatically scale up the number of processors? Should I scale out by spinning up more web servers? Do I need to re-architect my network topology to isolate disk traffic from client traffic? If I generate a report who will consume the results? Is the report for someone in IT? Purchasing? The process owner? Is it a technology or financial report? Products like Enterprise Manager allow you to generate all of these reports using different management extensions. My suggestion is to look at some of the introductory videos on the Oracle Tech Network to get an introduction to the problem that you are trying to solve then figure out how much it will cost to measure what is important to you.

database alternatives

One of the key questions that I get asked on a regular basis is to justify the cost of some product. Why not use freeware? Why not put things together and use free stuff? When I worked at Texas A&M and Rice University we first looked at public domain software. We heavily used the Apache web server, Tomcat, MySQL, Postgress, Linux, and BSD. These applications worked up to a point. Yes, you can spin up one Apache web server on one server. Yes, you can have one Apache web server listen on multiple IP addresses and host multiple web servers. The issue typically is not how many web servers can you handle but how many clients can you answer. Easily 90% of the web servers could handle the load that it saw on a regular basis. We spent 80% of our time on the 10% that could not handle the load. Not all of the web servers could handle the functionality. For example, a student registration system needs to keep a shopping cart of classes selected and you need to level up to an Apache Tomcat server to persistently keep this data and database connections live. If you use a web server you need to store all transactions in the database, all of the classes selected, and all of the fees associated with the class. Every interaction with the web server causes multiple connections with the database server. Doing this drives the number of processors needed by the database thus driving up the cost of the hardware and software license.

If we use an application server that can handle caching of data, we can keep a list of available classes on the application server and not only have to go back to the database server for transactions. When a student selects a class, it takes it out of inventory and puts it in their class schedule for the next year. The same is true for on-line shopping, purchasing tickets to a play or airline, drafting for a fantasy football team. Years ago ESPN ran a March Madness contest on-line. They presented your selections with an Apache web server and every team selection required an interaction with their database on the back end. The system operated miserably and it took hours to select all rounds to fill out your bracket. They updated the server with Javascript and a Tomcat server and allowed you to fill out all of round one in your browser. Once you finished the first round you submitted your selections and were presented with a round two based on your first round selections. They later put this on WebLogic and put all of the round selections in Java code on the WebLogic server. The single interaction with the database became submission of your complete bracket. They went from thousands of interactions with a database to a single interaction per submission.

We can have similar architecture discussions at the database layer as well. If I am looking at a simple table lookup, why pay for a robust database like Oracle 12c? Why not use something like Azure Table Storage Services and do a simple select statement from a file store. Why not put this in a free version of Oracle in Apex on the web and define a REST api to pull the data based on a simple or potentially more complex select statement. Again, 90% of the problems can be solved with simple solutions. Simple table lookups like translating a simple part name to a price can be done with Excel, MySQL, APEX, JSON processing, or REST apis. The difficulty comes up with the remaining 10%. How do I correlate multiple tables together to figure out the price of an item based on cost of inventory, cost of shipping, electrical costs, compensation costs for contractors and sales people, and other factors that determine profitability and pricing. How do I do a shortest routing algorithm for a trucking system based on traffic, customer orders, inventory in a warehouse, the size of a truck, and the salary of the driver and loading dock personnel. For things like this you need a more complex database that can handle multiple table joins, spatial data, and pulling in road conditions and traffic patterns from external sources. Products like IBM DB2, Oracle Database, and Microsoft SQL Server can address some of these issues.

We also need to look at recovery and restoration time. When a Postgress server crashes, how long does it take to recover the database and get it back online? Can I fail over to a secondary parallel server because downtime is lost revenue or lost sales. If you go to HomeDepot to order plumbing parts and their site goes down, how long does it take to go to the Ace or Lowes web site and order the same part and have it delivered by the same delivery truck to your home or office? Keeping inventory, order entry, and web services up becomes more than just answering a query. It becomes a mission critical service that can not go down for more than a few seconds. Services like Data Guard, Golden Gate, and Real Application Clustering are required to keep services up and active. MySQL, MongoDB, Amazon Aurora, and other new entry level database technologies can handle simple requests but take minutes/hours to recover information for a database. Failing over through storage to another site is typically not an answer in this case. It takes minutes/hours to recover and restart a moderate database of 20 TB or larger. First the data replication needs to finish then the database needs to be booted at a secondary site and it needs to maintain consistency in the data as it comes back up. The application server then needs to connect to the new service and recommit requests that came in during and since the system failure. As this is happening, customers are opening a new browser tab and going to your competition to find the same part on another site.

In summary, it takes more than just getting a bigger and faster application server or database. Moving the services to the cloud isn’t necessarily the answer. You need to make sure that you move the two components together the majority of the time. Look at your application and ask where do you spend more of your time? It is tuning sql statements? Is it writing new queries to answer business questions? Is it optimizing your disk layout to get tables to the database faster? Take a step back and ask why is the database pounding the disk so hard. Can I cache this data in the database by adding a little more memory to the disk controller or database server? Can I cache the data at the application server by adding more memory there and keep from asking the database for the same information over and over again? In the next few days we are going to look at database options and database monitoring. We are going to look at some of these tools and refer back to the bigger picture. Yes, we can tune the storage to deliver all of the bits at the highest rate possible. Our question will not be how to do this but should we be doing this. Would something like an Exadata or an in-memory option allow us to transfer less data across the storage network and get us answers faster? Would adding memory somewhere allow us to buffer more data and reduce the database requests which reduces the amount of data needed from the disk.