Customizing Win 10 desktop for vSphere and Terraform

In a previous blog we talked about installing Terraform on Windows 10. In this blog we are going to dive a little deeper and get a vSphere provider configured and ready to use from our Windows 10 desktop. To get started we need a way to get into our vSphere server. The easiest way is to log into the web console and get the information from there.

The more difficult way but allows for better automation is to do everything from the command line. Unfortunately, for Windows the default PowerShell version is not supported by the Command Line Module from VMWare and to run PowerCLI we need to upgrade to PowerShell 6 or higher. At the time of this writing PowerShell 7.0.3 was the latest version available. This binary can be downloaded and installed by following the documentation on the Microsoft website and pulling the binary from the official Microsoft github.com location.

The install is relatively simple and takes a minute or two

Once PowerShell 7 is installed we need to install PowerCLI by using an Install-Module command. The format of the command is

Install-Module -Name VMware.PowerCLI

The installation is relatively simple and takes a minute or two to download the code and extract. Once extracted we can connect to the vSphere server.

When it comes to connecting to the server we can have it ask us for the username and password or set these variables as environment variables. In the following video we set the variables $user and $server as well as the $pwd (not shown) then connect to the server using environment variables. When we first connect the connection fails because the SSL certificate on our server is self-signed and not trusted. To avlid this set need to execute the two commands to get a valid connection

Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false

Connect-VIServer -Server $server -User $user -Password $pwd

From here we can get the DataCenter, Folder structure of the VMs and Templates, as well as the Datastores for this installation.

Getting the parameters that we will need to populate a parameters.tfvars file can be done with the following PowerCLI commands

var.datacenter Get-DataCenter

var.datastore Get-Datastore -Name <name>

var.template_folder Get-Folder -Name “Templates and vCenter”

var.terraform_folder Get-Folder -Name “Terraform”

var.templates Get-Template -Location $var.template_folder

var.terraform_vms Get-VM -Location $var.terraform_folder

From here we have the base level data that we need to populate a parameters.tfvar file and define our datacenter, host, folder structure, datastores, and templates. These are typically relatively static values that don’t change much. At some point we might want to pull in a list of our ISO files to use for initializing raw operating systems. Most companies don’t start with an ISO file but rather a partially configured server that has connections into an LDAP or Active Directory structure as well as the normal applications and security/firewall configurations needed for most applications.

To summarize what we have done is to configure our Windows 10 default terraform desktop so that we can use a browser to pull parameters from a vSphere server as well as script and automate pulling this data from a vSphere server using the PowerCLI Module that runs under PowerShell 6 or 7. We should have access to all of our key data from our vSphere and ESXi server and can populate and create a set of terraform files using variables, data declarations, and resources that we want to create and manage. With this blog we have built the foundation to manage a vSphere or ESXi instance from an HTML browser, a PowerShell command line, or from terraform. The eventual goal is to have terraform do all of the heavy lifting and not enter data like username and password into configuration files so that we can use github for version control of our configuration and management files.

Terraform variables vs data

In our last blog post we looked at data vs resources with Terraform and talked about static vs dynamic characteristics of data when compared to resources. In this blog we are going to look at using variables to declare structures rather than using data declarations. We will also cover a third option to use Local Values rather than variables and where they might be useful. It is important to note that there is no right or wrong answer with the use of local, variables, or data since they effectively perform the same functions and do not destroy structures as resources do when you execute the destroy option with terraform.

First, let’s look at Local Values. Declaring a local value allows you to insert a relatively static label into a variable stream. They are typically used for structures like tags or common_tags rather than static constructs. It is an easy way to declare something like a version or group that manages and maintains the resource in question.

locals {
  service_name = "forum"
  owner        = "Community Team"
}
locals {
  # Common tags to be assigned to all resources
  common_tags = {
    Service = local.service_name
    Owner   = local.owner
  }
}
resource "aws_instance" "example" {
  # ...

  tags = local.common_tags
}

Note that the initial declaration is a name associated with a string. The second declaration aggregates references to these tags into another tag with the local.<name> reference. This name can then be accessed with the local.common_tags reference in main code and not have to replicate the service or owner tag information. Unfortunately, defining associations in a locals declaration does not allow for values to be passed in from the command line as is done with variables.

Input Variables allow you to define a string relationship similar to locals but also allows you to pass in values from other files or the command line. Input variables serve as parameters for a module and allow for customization and differentiation between two environments. For vSphere, for example, given that you can not have two vSphere providers or datacenters in the same code defining a datacenter for development and one for production can be done with variables and reference a common code base in another directory.

variable "image_id" {
  type = string
}

variable "availability_zone_names" {
  type    = list(string)
  default = ["us-west-1a"]
}

variable "docker_ports" {
  type = list(object({
    internal = number
    external = number
    protocol = string
  }))
  default = [
    {
      internal = 8300
      external = 8300
      protocol = "tcp"
    }
  ]
}

A variable definition has an identifier associated with it and typically a type that can be a string, number, or boolean and can be combined for more complex relationships like lists, sets, objects, or touples. Variables are typically defined in a file called .tfvars rather than a .tf file or can be passed in with the -var=”label=value” command line parameter. Alternately, variables can be defined as environment variables from the command line and the terraform command line understands how to read these values. Typically user credentials like username and password or public and private keys are defined in an environment variable rather than in a file. For a vSphere provider you can define the following environment variables and hide the connection detains to a server

  • VSPHERE_USER (var.user)
  • VSPHERE_PASSWORD (var.password)
  • VSPHERE_SERVER (var.vsphere_server)
  • VSPHERE_ALLOW_UNVERIFIED_SSL (var.allow_unverified_ssl)

Defining any of these variables on the command line get passed into the terraform control files without having to define them in a .tf or .tfvars file or having to pass them in with the -var command line extension. The var.<name> extension shown above are the constructs used to reference each of the parameters in the terraform control files. The three parameters required for a connection to a vSphere server are the var.user, var.password, and var.vsphere_server with the var.allow_unverified_ssl as an optional parameter.

provider "vsphere" {
  user           = var.vsphere_user
  password       = var.vsphere_password
  vsphere_server = var.vsphere_server

  # If you have a self-signed cert
  allow_unverified_ssl = true
}

Typically this is all the code that is needed to connect to a vSphere server. You could define the user, password, and vsphere_server as locals and reference them as local.user but that implies that one of your .tf or .tfvars files contains a password definition that becomes a security issue with file management and version control. Putting the user and password in an environment variable allows for dynamic changing of roles and credentials from the VMware side of the house without having to change your .tf or .tfvars files. Having the vspher_server defined with environment variables allows management of development, production, and disaster recovery using a common foundation file and not having to define a main.tf for each environment.

We could have just as easily defined our user, password and server with a Data Source definition rather than a variable definition. A data declaration is similar to a local declaration but can be more complex than a string comparison.

data "aws_ami" "example" {
  most_recent = true

  owners = ["self"]
  tags = {
    Name   = "app-server"
    Tested = "true"
  }
}

Declaring a username and password with a data definition is not the most secure and safe way of defining this data. Using a variable declaration and environment variable pulls this information out of source code control and security concerns. Defining a datastore or a template with a data declaration makes more sense given that structures like datacenter, datastore, folder structures, and templates hopefully do not change significantly over time. Templates might change based on new operating system releases but managing this change with a new declaration can be a good thing.

Hopefully, this post helps understand the key difference between variable and data declarations. Both have a purpose. Both can be used. There are some technical reasons to use one over the other. There are some security concerns where one might be a better selection. The real answer is to look at how your organization uses the different constructs and have a meaningful conversation on why one is used and why another is used instead. This is one of the grey areas where there is not one way of solving the problem.

Terraform data vs resources

In previous blog posts we have talked about Installing Terraform on Windows, Installing Terraform on Ubuntu, Providers part 1 and Providers part 2. In this blog we will look at differentiating between data sources and resources. Note that we are looking at terraform constructs differently from what is presented in the HashiCorp Terraform Associate Certification and not following the suggested study guide format but diving into what is required to build something from scratch.

The assumptions that we are going to make in this blog are that first we have a terraform binary installed on Windows or Ubuntu and second that our target system is going to be vSphere initially either for development or in house production systems.

Given that we talked about the vSphere provider in an earlier blog posting we won’t go into a discussion on adding and configuring providers other than to say that vSphere is a tricky configuration when it comes to managing multiple servers. If we were managing multiple Amazon or Azure regions or zones we would define one provider for one zone and another provider with a different alias for the second zone. The vSphere provider does not support multiple datacenters or aliases for providers so managing multiple hosts with terraform must be done with one vSphere server controlling multiple hosts or multiple folders and directories managing different vSphere servers. Both of these strategies work well. For this example we will look at multiple servers under one vSphere and a single terraform configuration to manage all of these hosts.

Before we get started on data vs resource definitions it is important to note that the Terraform Associate certification focuses on Cloud Engineering and not necessarily on-premises servers. The concepts are generic enough but for this blog we will cover vSphere as our example target.

Scrolling down to the outline of the exam note that “Create and differentate resource and data configuration” is one of the topics in the Read, generate, and modify configuration section.

If we look at the documentation for the Terraform CLI for Data Sources it notes that data sources allow data to be fetched or computed for use elsewhere in Terraform configuration. Each provider may offer data sources alongside its set of resource types. What this means is that we can declare things that we know exist and should not change. For vSphere we know that there is a datacenter as well as datastore. The datacenter is the root of the vSphere environment. Under the datacenter is a host and a datastore. If we correlate this back to our vSphere console we can pull this data from the user interface.

From our lab environment we log into the vSphere Client and go to the Summary screen of the root vSphere instance.

Note that we have 4 hosts displayed on the left side, 4 hosts listed in the center as well as 23 virtual machines. We do have a number of templates and virtual machines listed under our server 10.0.0.92 with two of the virtual machines currently running.

For our Terraform vSphere provider we need to first get the name of the Datacenter associated with this installation. We can do this by clicking on the Datacenters tab on the right side of the screen.

For this example we see that “Home-lab” is the Datacenter defined for our instance. In Terraform we would define this with a data element rather than a resource.

data "vsphere_datacenter" "datacenter" {
  name = "Home-lab"
}

The key reason that we use data rather than resource is that the terraform command will try to create this datacenter when an apply is run and destroy the datacenter when a destroy is run. We don’t want to destroy our datacenter but want to retain the definition to use as needed to access hosts and virtual machines.

From the Resources definition in the Terraform CLI, a resource describes one or more infrastructure object and running the apply command creates, updates, or destroys the object to force them to match the configuration in the configuration files. If your configuration gets changed then the terraform configuration tries to delete and reconfigure your datacenter. This is not the behavior that we want. What we want is to define everything that is static and should not change with the data declaration and things that are dynamic with the resource declaration. The elements that typically don’t change for vSphere are:

  • Datacenter
  • Host
  • Network
  • Datastore

Things that typically change but are good to use data declarations for are

  • Folders
  • Templates
  • Tags

Things that typically change are need a resource definition are

  • Virtual Machines
  • VMFS Disks
  • Roles
  • Users

To reference a data element you start with the data.<value> and use the type definition followed by the alias name. In the following example we define a host that is associated with a compute cluster that is inside a datacenter. We refer to the datacenter as data.vsphere_datacenter.dc and the cluster as data.vsphere_compute_cluster.c1 where “dc” and “c1” are aliases to identify the data definitions. We use the id parameter in the data construct to uniquely identify the resource.

data "vsphere_datacenter" "dc" {
  name = "TfDatacenter"
}

data "vsphere_compute_cluster" "c1" {
  name          = "DC0_C0"
  datacenter_id = data.vsphere_datacenter.dc.id
}

resource "vsphere_host" "h1" {
  hostname = "10.10.10.1"
  username = "root"
  password = "password"
  license  = "00000-00000-00000-00000i-00000"
  cluster  = data.vsphere_compute_cluster.c1.id
}

In the above example we are talking about the host “10.10.10.1” located in the compute cluster “DC0_C0” that lives in the TfDatacenter. In our Home-lab datacenter we can define our host 10.0.0.92 as a member of the Home-lab datacenter and provision virtual machines onto this host.

We would also define our vsphere_datastore defined by “name” that lives in the data.vsphere_datacenter.dc. To view the datastores from the vSphere Client, click on the Datastores tab or the disk icon to get a list of potential datastores.

Note that some of the datastores are listed as inacessible. These servers are currently turned off and the accessible ones are connected to our primary host and vSphere server which is the only one powered on for demonstration purposes.

In summary, data declarations should be used for static and permanent structures. Resource declarations should be used for things that can be created or considered transient or malleable. For a production datacenter that I managed we defined folders, virtual machine templates, and datastores with the data declaration.

data “vsphere_datacenter” “dc” {
name = “QM Lab”
}

data “vsphere_resource_pool” “pool” {
name = “TintonFalls/Resources”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_host” “host” {
name = “172.19.21.54”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “DS6” {
name = “DS6”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “EQL1-Raid5” {
name = “EQL1-Raid5”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “EQL2-Raid6” {
name = “EQL2-Raid6”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “ADV-ESX4-maglib1” {
name = “ADV-ESX4 maglib1(7.2K)”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “ADV-ESX4-maglib2” {
name = “ADV-ESX4 maglib2(7.2K)”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “ADV-ESX4storage1” {
name = “ADV-ESX4 storage1(15K)”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “ADV-ESX4storage2” {
name = “ADV-ESX4 storage2(15K)”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “ADV-ESX4storage3” {
name = “ADV-ESX4 storage3(15K)”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “ADV-ESX4storage4” {
name = “ADV-ESX4 storage4(15K)”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “ADV-ESX4storage5” {
name = “ADV-ESX4 storage5(15K)”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_datastore” “corenfsshare” {
name = “corenfsshare”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_network” “network” {
name = “VM Network”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “W2k8-sp18-template” {
name = “W2k8-sp18-template”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “Windows2016-template” {
name = “2016_template”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “W2k16-Aug-2020-template” {
name = “W2k16-Aug-2020-template”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “centos-base” {
name = “centos-base”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “rhel-base” {
name = “rhel-base”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “suse-sap-base” {
name = “suse-sap-base”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “te-kubernetes-base” {
name = “te-kubernetes-base”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “ubuntu-base” {
name = “ubuntu-base”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “ubuntu-docker-desktop” {
name = “ubuntu-docker-desktop”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “Windows-10-template” {
name = “Windows-10-template”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_folder” “parent” {

path = “${data.vsphere_datacenter.dc.name}/vm/Technical Enablement”
}

data “vsphere_folder” “Apps” {
path = “${data.vsphere_folder.parent.path}/Apps”
}

data “vsphere_folder” “testing” {
path = “${data.vsphere_folder.Apps.path}/testing”
}

data “vsphere_folder” “sp22” {
path = “${data.vsphere_folder.Apps.path}/sp22”
}

data “vsphere_folder” “sp21” {
path = “${data.vsphere_folder.Apps.path}/sp21”
}

data “vsphere_folder” “sp20” {
path = “${data.vsphere_folder.Apps.path}/sp20”
}

Terraform Providers

One of the foundational components of automation is being able to speak in the language of your target. With AWS, for example, CloudFormation is a good tool to define what a deployment in AWS should look like and ensures conformity to the design definition. The main problem is that CloudFormation only works on AWS and does not work on other deployment platforms. Terraform, on the other hand, performs the same automation from a configuration definition and creates the desired components onto a variety of platforms. The mechanism used to perform this function is the inclusion of a provider definition. If you think in terms of Java or C programming a provider is a set of library functions that can be called and including a provider definition is similar to a include statement to pull in a library header.

Some good blogs that compare and contrast Terraform vs CloudFormation include:

If you look at the definition of a provider from HashiCorp on their Providers page it defines a provider as a way to expose the API interface of the backend system as well as tasks that might be needed like random number generation utilities to generate names. The Terraform Registry includes a list of providers and systems that Terraform can interface with. Checking the public cloud box provides us with a list of various cloud hosting targets that we will focus on in later blogs.

For the purpose of this blog we will dive into the VMware vSphere provider to get an understanding of how to call it, what happens when you call it, and what constructs are needed when you call it. In a previous blog we compared the vSphere provider to the AWS provider on a very high level to talk about the format differences between providers. In this blog we will dive deeper into the vSphere provider to help understand how to deploy it in a development, production, and disaster recovery scenario.

Selecting the vsphere provider and clicking on the USE PROVIDER button at the top right it shows that you can call the provider with either the required_providers or provider command structures. We will use the simplest example by calling only

provider “vsphere” { }

Looking at the documentation there are a variety of optional and required parameters that are needed inside the curly brackets.

The parameter options that we need for the provider definition include (taken straight from the hashicorp page):

  • user – (Required) This is the username for vSphere API operations. Can also be specified with the VSPHERE_USER environment variable.
  • password – (Required) This is the password for vSphere API operations. Can also be specified with the VSPHERE_PASSWORD environment variable.
  • vsphere_server – (Required) This is the vCenter server name for vSphere API operations. Can also be specified with the VSPHERE_SERVER environment variable.
  • allow_unverified_ssl – (Optional) Boolean that can be set to true to disable SSL certificate verification. This should be used with care as it could allow an attacker to intercept your auth token. If omitted, default value is false. Can also be specified with the VSPHERE_ALLOW_UNVERIFIED_SSL environment variable.
  • vim_keep_alive – (Optional) Keep alive interval in minutes for the VIM session. Standard session timeout in vSphere is 30 minutes. This defaults to 10 minutes to ensure that operations that take a longer than 30 minutes without API interaction do not result in a session timeout. Can also be specified with the VSPHERE_VIM_KEEP_ALIVE environment variable.

For security sake it is recommended to hide user and password information in a different file from the definition or have it as environment variables in the shell to pass into terraform. In this example we will create two files, variables.tf and main.tf to simple call the provider definition and look at the constructs that are created by terraform.

The main.tf file looks like

provider “vsphere” {
user = var.vsphere_user
password = var.vsphere_password
vsphere_server = var.vsphere_server
version = “1.12.0”

allow_unverified_ssl = true
}

Note the use of var.<something> to pull in the definition of an externally defined variable. This could be done with a second file or with environment variables. For a variables.tf file we could enter

variable “vsphere_user” {
type = string
default = “administrator@vsphere.local”
}

variable “vsphere_password” {
type = string
default = “NotTheRIghtPassword”
}

variable “vsphere_server” {
type = string
default = “10.0.0.72”
}

In the variables.tf file we define three string values and include a default value to pre-define what the variable should be defined as. If we open up a PowerShell windows (or Terminal on Linux) we can see that there are only the variables.tf and main.tf files in the directory.

Note that we are using PowerShell 7 as the command line interface so that we can test out the connection to our vSphere server using PowerCLI commands to verify variable definitions.

If we type

terraform init

The hashicorp/vsphere provider data is pulled from the web and placed in the .terraform subdirectory.

Looking at the .terraform directory it contains a grouping of libraries that we can call from our main.tf definition file.

If we use the tree command we can see the nested structure and note that there is a selections.json file at the plugins and a terraform-provider-vsphere_v1.12.0_x4.exe at the windows_amd64 subdirectory

What the init command did was find out what platform we are running on and pulled down the appropriate binary to translate terraform modules and resource calls into API calls into vSphere. For our example we will make API calls into our vSphere server located at 10.0.0.72 as administrator@vsphere.local with the given password. The selections.json file contains a hash value that is used to test the binary integrity of the terraform-provider-vsphere_v1.12.0_x4.exe and download a new version if needed next time the init command is issued.

At this point we can call the

terraform plan

command to test our main.tf and variables.tf configurations. Everything should work because the syntax is simple so far.

Note that we don’t have a state file defined yet. This should happen when we type

terraform apply

Once we execute this command we get a terraform.tfstate file locally that contains the state information of the current server. Given that we have not made any resource definitions, data declarations, or module calls we don’t have any need to connect to the server. The tfstate file generated is relatively simple.

{
“version”: 4,
“terraform_version”: “0.13.3”,
“serial”: 1,
“lineage”: “f35a4048-4cee-63e0-86b2-e699165efbe5”,
“outputs”: {},
“resources”: []
}

If we included something simple like a datacenter definition the connection will fail with the wrong password.

Putting in the right password but the wrong datacenter will return a different value

To get the right datacenter we can go to the vSphere html5 user interface or use the Connect-VIserver command to look for the datacenter name.

In this example we should use the Home-Datacenter as the Datacenter name.

It is important to note that the tfstate file changes with the successful apply and the resources section now contains valid data about our server.

{
“version”: 4,
“terraform_version”: “0.13.3”,
“serial”: 2,
“lineage”: “f35a4048-4cee-63e0-86b2-e699165efbe5”,
“outputs”: {},
“resources”: [
{
“mode”: “data”,
“type”: “vsphere_datacenter”,
“name”: “dc”,
“provider”: “provider[\”registry.terraform.io/hashicorp/vsphere\”]”,
“instances”: [
{
“schema_version”: 0,
“attributes”: {
“id”: “datacenter-3”,
“name”: “Home-Datacenter”
}
}
]
}
]
}

In summary, we have looked at how to find various providers to use with terraform, how to call a sample provider and what constructs are created when the init, plan, and apply functions are used with the local terraform binary. Fortunately, none of this changes if you are using Windows, Linux, or any other operating system. The provider directory under the .terraform tree contains the binary to translate from local API calls to API calls on the target system. This is a simple example but gives a good overview of what a good and bad connection into a vSphere server looks like and how to troubleshoot the connection. This construct should also work for a direct connection into an ESXi server without having to spin up a vSphere management instance.

HashiConf – October 2020 Conference

HashiCorp is holding their annual users conference online this year and I will be attending virtually to learn what is new and being announced around Terraform. The conference is a two day conference starting Oct 14th and runs through Oct 15th as well as two days of workshops on the 12th and 13th. This blog will cover part of the full schedule since not all of the presentations are Terraform centric.

HashiConf Digital Opening Keynote

The introduction keynote was interesting with conference shots from the presenter’s homes. The number of attendees (12K) and new number of employees (1K) were interesting numbers. The rest was mostly marketing information about HashiCorp. Some interesting facts: 1K enterprise customers, 6K new users/month, growing with cloud partners and technology partners. Certification program – http://hashicorp.com/certification. Learning program – http://learn.hashicorp.com

Unlocking the Cloud Operating Model: Provisioning

Vault as a Security Platform & Future Direction

Vault is the security layer on top of Terraform and allows storage of security and secrets for Kubernetes and other platforms in a secure manner. The bulk of downloads last year was a combination of Vault in conjunction with Kubernetes. The discussion continued from a banking customer that used Vault to store API keys, Certificates, as well as username/passwords. Vault also allows for automation or key rotation and X.509 certificates to be dynamically assigned and consumed.

Options for running Vault – traditional way of download and run as well as SaaS in the HashiCorp Cloud Platform. New announcement of Vault on AWS as a service.

Consul is an extension of Vault allowing for network infrastructure automation that includes service discovery as well as access rights, authorization, and connection health. Consul can reconfigure and change on-premises server like Cisco and cloud network configurations like load balancers, network security rules, and firewalls. New announcement of Consul on AWS as a service as well as Consul 1.9 with significant enhancements for Kubernetes

Human Authentication and Authorization is another layer that can cause problems or issues with system configuration and automation. Traditional products like Active Directory or LDAP for on-premises or Okta or AzureAD for cloud credentials can be leveraged to provide auth and authz resources. The trick is how to leverage these trusted sources into servers and services. Traditionally this was done with SSH keys or VPN credentials with secure network and known IP addresses or hostnames. With dynamic services and hosts this connection becomes difficult. Leveraging services like Okta or AzureAD and role based access for users or services is a better way of solving this problem. Credentials can be dynamically assigned to role and rotated as needed. The back end servers and services can verify these credentials with the auth service to verify authorization for the user or role for access. HashiCorp Boundary provides the linkages to make this work.

Boundary establishes a plugable identity provider into authentication source to verify user identities. A second set of plugables connect to an authorization source and integrates with HashiCorp Vault to access services with stored secrets allowing secrets to be rotated and dynamic.

Vault as a Security Platform and Future Direction

Vault centrally stores secrets for infrastructure

Vault can centrally store username and passwords, public and private keys, as well as other dynamic or secure credentials. In the image above a web server pulls the database credentials from Vault rather than storing it in code or config files and the webserver can use these dynamic credentials to connect to a database. This workflow can easily change and have the webserver request credentials from Vault and Vault connects to the database to generate a short lived auth token which is then passed back to Vault and then to the web server.

Building a Self-service vending machine to streamline multi-AWS account strategy

The presentation was from Eventbrite describing how they use Terraform and the HashiStack to manage AWS and a multi-account AWS solution. Multiple AWS accounts are needed to isolate different domains and solutions. Security can be controlled across all accounts through automation. The AWS Terraform Landing Zone (TLZ) quickly became a solution. This product was introduced a year ago as a joint project between HashiCorp and Amazon.

The majority of the conversation was business justification for a multi-AWS account management requirement and how AWS Control Tower would not work. From the discussion and chat it appears that TLZ is still in beta and could potentially make things easier.

Terraform in Regulated Financial Services

Customer presentation from Deutsche Boerse Group discussing Terraform deployment into AWS, Azure, and GCP. Fully automated electronic training application. Terraform and Packer foundation to building and managing systems. Infrastructure as Code (IaC) helps with regulation reporting and guidelines in the financial industry. The Terraform helps define uniform policies and procedures. Code is designed and split into product zones that represents different applications or functions.

Under the terraform directory is a split of dev, test, prod, and etc directories with product lists under each one.

Note that there are a few structures that are common across all modules and there are specific product and class of service. Network controls are controlled through a central network definition. Customizations can be made to note changes that vary from the company policies and procedures.

A standard module for a hub can be defined for services like monitoring and network.

This results in a core module that is secure and compliant with environments.

Packer in layered on top of this to harden the operating system and provision customizations into each virtual machine. Ansible configures the machine and can deploy straight to a cloud provider through a private marketplace or personal template.

Terraform Consistent Development and Deployment

This presentation reviewed what Comcast has done with Terraform. The primary goals are consistency and accuracy. Having everyone run the same configuration and secrets helps reduce complexity. Secondary goal is to have dev, test, and prod configurations the same in different regions and locations.

Bootstrap is done from a Git repository then managed with cloud storage backend

State is stored and referenced from a common backend.

Use a makefile with targets to run the proper terraform command with the proper environment variables. This allows you to integrate state, Vault, and secrets on all desktops and in the CI/CD tool.

Two levels of variables. One that are specific to a platform. The second is global variables. It is easy to set defaults and override when needed. The difficulty is to compare two environments to see changes and differences.

With this module you end up with a vars folder and tfvars file unique to different environments. The Makefile pulls in the right value and ingests the desired tfvars file.

Remainder of presentations

The remainder of the presentations were Vault or Consul presentations. I primarily wanted to focus on Terraform deployments and presentations in this blog. More tomorrow given that day 2 is more Terraform focused.

Supporting multiple providers

One of the key uses of Terraform is to deploy development and production systems. Terraform can be used to manage what is deployed, manage resources, and restrict resources available to an instance. In our last blog entry we looked at the vSphere provider and looked at some of they key parameters that are needed to deploy solutions into this virtual environment.

In a perfect world we should be able to develop definitions to deploy development systems to a small or older system, deploy production to a more expensive and powerful vSphere cluster, and a disaster recovery copy to make sure that we can failover to an alternate datacenter in times of emergency. We should then be able to take the data for this provider and move it to Amazon AWS or Microsoft Azure or Google GCP by just changing the provider. Unfortunately, this is not a perfect world and there are a ton of reasons that this won’t work.

If we look at the documentation for the AWS provider we note that we don’t need a username and password or IP address but rather need a public and private key to connect to an AWS serviced and these parameters can be provided by command line environment variables. We can also define multiple providers and give an alias for the multiple providers and deploy services into different accounts, regions, and zones based on the terraform provider definition.

A typical aws provider main.tf file looks like…

provider “aws” {
version = “> 2”
profile = “default”
region = var.dev_location
alias = “dev”
}

provider “aws” {
version = “> 2”
profile = “default”
region = “us-west-1”
alias = “prod”
}

allowing you to deploy resources into “aws.dev” or “aws.prod” with a variable.tf file containing nothing or

variable “createdby” {
type = string
default = “TechEnablement”
}

variable “environment” {
type = string
default = “TechEnablement”
}

variable “dev_location” {
type = string
default = “us-east-1”
}

With this variable.tf definition you need to define environment variables to define the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY or define shared_credentials_file in a terraform configuration file to point to the location of a key file. On Windows this is typically “%USERPROFILE%\.aws\credentials”. The format of the credentials file looks like

[default]

aws_access_key_id=AWSSAMPLE7EXAMPLE

aws_secret_access_key=long/keywith/numbers4&letters

Unfortunately the vSphere provider does not allow for an alias tag and use of different account credentials and vsphere host address. Rather than defining multiple providers you need to define different directories and different variable.tf and main.tf files for each of the environments. In our earlier example we would have a dev, prod, and dr folder under our main folder. Each folder would have terraform configuration files to define what each environment would look like and resources available.

A typical multi-environment tree would look different from out initial single tree deployment with a dev, prod, and dr folder each containing the same main.tf files but different variable.tf definitions. Each folder would have their own terraform.tfstate file as well given that there are different environment variables and states on different servers.

If you try to define multiple vsphere providers in one file you get the error

Given the differences between the two provider types it begs the question of changing the aws provider to the same file format as the vsphere provider and have three different folders that deploy different environments to different servers. This would work but having everything in one file reduces complexity and potential errors by having multiple copies in multiple folders. Editing one does not guarantee changes to the other directories and there might be subtle differences between the different environments, like datastore names or locations as well as network definitions, that are unique to each environment.

In summary, there are multiple ways of solving the same problem. The ultimate solution is to write a generic provider that can deploy services into vSphere, Hyper-V, Nutanix, other on-premises virtual machine hosts, AWS, Azure, Google GCP, and other cloud virtual machine hosts. Given that there is no generic provider that works across all or even multiple environments you have to decide how to deploy multiple terraform configuration files to multiple target locations without doubling or tripling your work and code that needs support and maintenance. My recommendation is to go with different folders for different environments and have different variable.tf and main.tf files in each folder.

Installing Terraform on Ubuntu

Welcome to an ongoing series of Terraform tips, tricks, and tutorials. On this journey we are going to look at what it takes to use Terraform to manage resources running in VMware, Azure, AWS, and Google cloud. We will look at the differences between running Terraform on Linux and Windows and show examples of both. The assumption is that you know what Terraform is and just need to know how to do things with it. In a previous blog, we discussed how to install Terraform on Windows. In this blog we will look at installing Terraform on an Ubuntu 18 server.

For these examples we will use a VMware generic deployed instance so that we can go back to the same system and build upon the previous posting of how to do something. For this example we install a generic Ubuntu 18.04.5 desktop instance. We could just as easily have done this from a server instance and done everything from the command line using wget to get the Terraform binary.

Rather than using the wget command and having to figure out which version of Terraform to download we use the Firefox browser and go to http://terraform.io to download the binary.

If you forget the Terraform website you can easily do a search for the term terraform download ubuntu which returns a variety of tutorials and the HashiCorp Terraform site. Scrolling down on the site we see a variety of operating systems that are supported for the Terraform platform. Select the Linux 64-bit from the list to download.

Once the download is finished we need to unzip the binary from the zip file. Prior to unzipping the file we need to install the unzip package. This is done on Ubuntu with the apt-get command

sudo apt-get update

sudo apt-get install wget unzip

The update command makes sure that all patches and updates are installed. The install command makes sure that the wget (which is not necessary for this example) and unzip are installed. Once the unzip command is installed execute the unzip command to extract the terraform binary.

unzip terraform_0.13.4_linux_amd64.zip.

The askubuntu website has a good cookbook on how to perform this installation and testing of the binaries.

The last step to getting Terraform installed on Ubuntu is to place the terraform binary in the path of the current user. Rather than placing this in a user specific bin directory it is best practice to put binaries like this in /usr/local/bin to be used by automation scripts and other users on this system. We can either copy or move the binary to this location using the sudo command to write to a root protected directory.

sudo mv terraform /usr/local/bin

Once the binary is relocated we can test the terraform binary by typing

terraform version

terraform

These commands not only test the binary but test that the binary is in the proper path to be executed.

At this point we have a terraform development platform that can be used to provision systems and services on a wide variety of cloud and virtualization platforms. To see this process in action, watch a video capture of this procedure.

In summary, installation of Terraform on Ubuntu is relatively simple. The three minute video shows everything required from start to finish to get a Terraform platform configured to be used from a terminal.

some additional blogs for a different perspective

Installing Terraform on Windows

Welcome to an ongoing series of Terraform tips, tricks, and tutorials. On this journey we are going to look at what it takes to use Terraform to manage resources running in VMware, Azure, AWS, and Google cloud. We will look at the differences between running Terraform on Windows and Linux and show examples of both. The assumption is that you know what Terraform is and just need to know how to do things with it.

Let’s get started by installing the Terraform binary on a Windows 10 desktop system. For these examples we will use a VMware generic deployed instance so that we can go back to the same system and build upon the previous posting of how to do something. The instance that we will be using for our demonstration is a Windows 10 Pro desktop that was fresh installed from an iso file.

On Windows, the easiest way to download the Terraform binary is to open a web browser and go to http://terraform.io to get the latest version. Alternative, you could download a curl or wget package in PowerShell but this gets more complex than navigating to a web location.

It you forget the web address a simple web search using the phrase “terraform install windows” will help find the latest page as well as a few tutorials on how do perform the necessary steps. Unfortunately, the Terraform web page focuses on how to install the binaries onto a Linux or Mac OS platform and not Windows.

To download the zip file on Windows, scroll down to the Windows icon and select the 64-bit link. The 64-bit version is selected because that is the version of the operating system that is installed.

The next step is to extract the terraform.exe file from the zip file. When the file is finished downloading, open a file browser and right click on the zip file name. Select Extract all… to create a subfolder and the terraform.exe binary.

We could say that we are finished at this point but you would have to reference the path of this binary in every execution. What would make the installation much easier is to modify the %PATH% variable to include the terraform_0.13.4_windows_amd64 folder in the path. To do this, open up a control center window from the start menu by clicking on the start button (or key) and typing control center or environment variable. From here you can modify the path for the user or system wide based on what you select from the control center window.

In this example we change the local path rather than the path for all users. To change the path, click on the path listing the click Edit. Select the location of the terraform.exe folder and add it to the path.

To test the installation, open a Command Console or PowerShell and type

terraform version

terraform

At this point we have a terraform development platform that can be used to provision systems and services on a wide variety of cloud and virtualization platforms. To see this process in action, watch a video capture of this procedure.

In summary, installation of Terraform on Windows 10 is relatively simple. The three minute video shows everything required from start to finish to get a Terraform platform configured to be used with PowerShell.

Some different blogs to give you a different perspective

Automating processes

Recently I have been working on my AWS Architect certification. Rather than just grinding through the training material and practice exams I thought I should actually build something and journal my process. I have done this internally for Commvault but wanted to do an external blog as well so that if I left Commvault I would have a copy of my notes.

The first step in building something is documenting the project including the goals, objectives, and components that will be needed. Initially I interviewed the person that I am building this project for and drew everything on paper (I know, old school). From there I transcribed it into an AWS architecture diagram using LucidChart.

The goal of the project is to take a zip file that is an aggregation of ECGs performed on student athletes and upload it into S3. Once the zip file is uploaded it kicks off a process that unzips the files and copies them to another folder inside the bucket. From here these files are copies to Dropbox and a notification is sent to one or more email boxes or text messages. From this notification a Cardiologist interprets the results and responds to the email or text. Once the response is received the interpreted files are transferred from Dropbox back into S3 and sorted according to the school that was screened. Students that were marked as low risk are stamped with a low risk label. Students that were marked needing a follow up or high risk are placed into a different folder for manual processing and a notification is sent via email and/or text requesting for manual intervention.

The first step in the process is taking a zip file that was uploaded into S3 and processing it. Fortunately we have the ability to launch processes when a file is uploaded into S3 with the Lambda functions. A good place to start learning about this is https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html where the tutorial talks about how to create a Lambda function and tie it to changes to a specific bucket. This also required creating an IAM role that allows a Lambda function to read and write an S3 bucket.

Step 1) Create IAM Role. This id done by going to the IAM console and selecting Roles at the left of the screen. We want to click on the Create role blue button near the middle of the screen.



We select the Lambda function and click Next:Permissions at the bottom of the screen.

To make things easy we select AmazonS3FullAccess. What we want is the ability to read and write objects in an S3 bucket. We need to type in S3 in the Filter policies then select the box next to AmazonS3FullAccess.

We skip the tags role and go to Review. Here we enter a Role name. We will call it gbgh-processing. Once we enter this information we click Create role at the bottom right of the screen.

At this point we have a role that allows our Lambda function to access S3 objects and manipulate them.

Step 2) The next step is to create a bucket that we will upload files into. This is done by going to the S3 console and creating a bucket. The bucket must be in the same zone that we create our Lambda function. In our example we will create a bucket called gbgh-test and put it in the US East region.

We click on the Create bucket blue button at the top left of the screen. The bucket name needs to be unique and we want the US East region. We don’t want to copy settings from other buckets and will configure the options on our own. We will be using gbgh-test as the bucket name.

We will go with the default options and clear all of the permissions as shown below. We want to open up public access for our bucket because a variety of people long term will be uploading files into the bucket. We can control access through other mechanisms at a later date.

From here we click Next and Create Bucket on the next screen.

We should see our bucket ready and available in the S3 bucket list.

Step 3) create a Lambda function that receives S3 object changes so that we can process the files and do something with it. To do this we go to the Lambda function console.

Notice that we have a few functions already defined. Some are Java. Some are Node.js. Some are Python. We will be creating a Python 2.7 binary so that we can use the boto3 library. This pre-defined library allows us to quickly and easily manipulate objects in a bucket and call other AWS services like email and queue services. To start we click on the orange Create function button at the top right of the screen.

We will Author from scratch our code since we have some simple code that we can work from. We will call the function gbgh-test and select Python 2.7 as the runtime. We will select the gbgh-processing role to give our function access to the S3 bucket that we created.

When we click on the Create function it drops us into the Designer tool for editing and testing our Lambda function. The first thing that we want to do is add an S3 trigger to link S3 objects to our function.

The trigger is found under the Designer – Add triggers list. Scroll down and click on S3. It will show that Configuration is required and we will need to scroll down to configure the trigger. From here we will select the bucket and Event type. We are looking for create events which indicates that someone uploaded a file for us to process. We select the gbgh-test bucket and stick with the All object create events.

After clicking on Add we nativate to the gbgh-test icon to see the Function code where we can edit the function and test code that we want. The default handler does not do what we want so we need to replace this code.

For our function we are going to start simple. We will handle a create event, pull the bucket name from the event handler, pull the file name that was uploaded, and print all of the items in the bucket for testing. The code starts with an import of the boto3 library. We create a handle into S3 with the boto3.resource(‘s3’) library call. The lambda_handler is launched when a file is created in our S3 bucket. From here we print some diagnostics that the handler was called then walk the object list in the bucket and print the object names. From here we return and terminate the Lambda function. Sample code for this can be found at lambda_function_s3_upload.js

The next step is to create a test event to simulate a file upload. We do this by clicking on the Select a test event pulldown and selecting Configure test events.

For the test event we need to simulate an ObjectCreated:Put call or an https PUT that causes a file to be uploaded to our S3 bucket. We need to define the arn for our bucket and bucket name as well as an object name (gbgh.zip) that we need to actually create in our gbgh-test bucket. We define the event name as gbghTest and paste in our simulated event call. The code for this can be found at testS3Put.json

The data at the bottom of the screen are just curly brackets to finish out the record definition. From here we click on Create (I had to scroll down to see it) to create our test function. At this point we can select gbghTest and click on the Test button at the top right of the screen. When we first tried this it failed with an error. We had to add two lines to import the json and urllib functions since we call them to process the keys or object names. Once we make these changes we see that the status came back Succeeded and we get an empty listing of gbgh-test if we scroll down in the Execution results screen.

We can upload our gbgh.zip file into our gbgh-test bucket and it should appear at the bottom of the Execution results screen.

To summarize, we have a Lambda function that gets launched when we upload a file into an S3 bucket. Currently the function just prints a directory of the bucket by listing all of the objects that it contains. We had to create an IAM policy so that our function can interact with S3. In the next blog post we will do some processing of this zip file and send some notifications once the zip file is processed.

database options

Before we dive into features and functions of database as a service, we need to look at the options that you have with the Oracle Database. We have discussed the differences between Standard Edition and Enterprise Edition but we really have not talked about the database options. When we select a database in the Oracle Cloud we are given the choice of Enterprise Edition, High Performance Edition, and Extreme Performance Edition. Today we are going to dive into the different Editions and talk about the options that you get with each option. It is important to note that all of the options are extra cost options that are licensed on a per processor or per user basis. If you go with Amazon RDS, EC2, or Azure Compute you need to purchase these options to match your processor deployment.

One of the standard slides that I use to explain the differences in the editions is shown below.

The options are cumulative when you look at them. The Enterprise Edition, for example, comes with Transparent Data Encryption (TDE). TDE is also included in the High Performance and Extreme Performance Editions. We are going to pull the pricing for all of these options from the Technology Price List. Below is a list of the options.

  • Enterprise Edition
    • Transparent Data Encryption
  • High Performance Edition
    • Diagnostics
    • Tuning
    • Partitioning
    • Advanced Compression
    • Advanced Security
    • Data Guard
    • Label Security
    • Multitenant
    • Audit Vault
    • Database Vault
    • Real Application Testing
    • OLAP
    • Spatial and Graphics
  • Extreme Performance Edition
    • Active Data Guard
    • In Memory
    • Real Application Clusters (RAC)
    • RAC One

Transparent Data Encryption

TDE is a subset of the Advanced Security option. TDE stops would-be attackers from bypassing the database and reading sensitive information from storage by enforcing data-at-rest encryption in the database layer. Data is stored in the table extents encrypted and read into the database encrypted. The Oracle Wallet is needed to read the data back and perform operations on the data. Advanced Security and Security Inside Out are blogs to dive deeper into TDE features, functions, and tutorials. There is also a Community Security Discussion Forum. The Advanced Security option is priced at $300 per named user or $15,000 per processor. If we assume a four year amortization the cost of this option is $587.50 per month per processor. The database license is $1,860 per month per processor. This says that a dual core system on Amazon EC2, RDS, or Azure Compute running the Oracle database will cost you the cost of the server plus $2,448 per month. If we go with a t2.large on Amazon EC2 (2 vCPUs and 8 GB of RAM) and 128 GB of disk our charge is $128 per month. If we bump this up to an r3.large (2 vCPU, 15 GB of RAM) the price goes up to $173 per month. The cost will be $2,620 per month which compares to Enterprise Edition at $3,000 per month per processor for PaaS/DBaaS. We could also run this in Oracle IaaS Compute at $150 per month (2 vCPUs, 30 GB of RAM) to compare apples to apples. It is strongly recommended that any data that you put in the cloud be encrypted. Security is good in the cloud but encryption of data in storage is much better. When you replicate data or backup data it is copied in the format that it is stored in. If your data is clear text, your backups could be clear text thus exposing you to potential loss of data. Encrypting the data at rest is storage is a baseline for running database in the cloud.

Diagnostics

Diagnostics is a subset of the Database Management Packs that allows you to look into the database and figure out things like lock contention, what is holding up a wait queue, and what resources are being consumed by processes inside the database. Historic views into the automated workload repository (AWR) reports are available with this option. You can get spot options but not historical views and comparative analytics on AWR information. Some of the tools are free like compression advisor and partitioning advisor while others are part of the diagnostics pack. Diagnostics are licensed at $150 per named user or $7,500 per processor. This correlates to $294 per processor per month. Unfortunately, you can’t purchase Enterprise Edition DBaaS and add this but need to go with IaaS Compute and add this to the bring your own database license. The only way to get this feature is to go with the High Performance Edition. The binary that is installed on the cloud service specifically labels the database as Enterprise Edition, High Performance Edition, or Extreme Performance Edition. All of the features listed from here and below are prohibited from running on the Enterprise Edition when provisioned into the Oracle DBaaS. If you just want Diagnostics Pack on Enterprise Edition it does not make economic sense to purchase High Performance Edition at $4,000 per month per processor when you can do this on IaaS at $2,914 (the $2,620 from above plus $294).

Tuning

Tuning is also a subset of the Database Management Packs that allows you to look into sql queries, table layouts, and overall performance issues. Options like the SQL Tuning Advisor and Automatic SQL Tuning are part of this option. Tuning pack is $100 per named user or $5,000 per processor. This comes in at $196 per processor per month if purchased separately. A Tuning Whitepaper details some of the features and functions of the tuning pack if you want to learn more.

Partitioning

Partitioning is a way of improving performance of your database and backup by splitting how data is stored and read.
Partitioning is powerful functionality that allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity. Oracle provides a comprehensive range of partitioning schemes to address every business requirement. The key improvement is to reduce the amount of data that you are reading into memory on a query. For example, if you are looking for financial summary data for the last quarter, issuing a query into eight years of financial data should not need to read in 32 quarters of data but only data from the last quarter. If we partition the data on a monthly basis we only read in three partitions rather than all 32. Partitioning also allows us to compress older data to consume less storage while at rest. When we backup the database we don’t need to copy the older partitions that don’t change, only backup the partitions that have updated since our last backup. Partitioning is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. The three most purchased database options are diagnostics, tuning, and partitioning. The combined cost of these three options is $940 per processor per month. When we compare the $4,000 per processor per month of DBaaS to IaaS with these three options we are at parity.

Advanced Compression

Advanced Compression is a feature that allows you to compress data at rest (and in memory) so that it consumes less resources. Oracle Advanced Compression provides a comprehensive set of compression capabilities to help improve performance and reduce storage costs. It allows organizations to reduce their overall database storage footprint by enabling compression for all types of data: relational (table), unstructured (file), network, Data Guard Redo and backup data. Cost comparisons for this feature are directly comparable to storage costs. Advanced compression is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. Typical compression ratios are 3x to 10x compressions. This means that 1 TB of data will take up 600 GB or 100 GB at these compression ratios. Lower compression rates are recommended for data that lightly changes and high compression for data that will not change. The penalty for compression comes in when you update data that is compressed. The data must be uncompressed, the new data inserted, and recompressed.

Advanced Security

Advanced Security allows you to secure and encrypt data in the database. Advanced Security provides two important preventive controls to protect sensitive data at the source including transparent database encryption and on-the-fly redaction of display data. TDE stops would-be attackers from bypassing the database and reading sensitive information directly from storage by enforcing data-at-rest encryption in the database layer. Data Redaction complements TDE by reducing the risk of unauthorized data exposure in applications, redacting sensitive data before it leaves the database. Advanced Security is priced at $300 per named user and $15,000 per processor. The monthly cost will be $587.50 per month per processor for this option. Data redaction is typically required for replicating production data to development and test. If you have credit card, social security numbers, home addresses, or drivers license information in your database, redaction is important to have to remain Sarbanes Oxly and PCI compliant.

Data Guard

Data Guard is a key foundation piece of Maximum Availability Architecture and does not cost any additional money. You get data replication between two databases at no additional cost and data can be replicated as physical or logical replication between the database instances. This feature ensures high availability, data protection, and disaster recovery for enterprise data. Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive disasters and data corruptions. Data Guard maintains these standby databases as transactionally consistent copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage. Data Guard can be used with traditional backup, restoration, and cluster techniques to provide a high level of data protection and data availability. It is important to note that Data Guard is not allowed in Amazon RDS and you must use EC2 or another cloud service to use this feature.

Label Security

Label Security has the ability to control access based on data classification and to enforce traditional multi-level security (MLS) policies for government and defense applications. Oracle Label Security
benefits commercial organizations attempting to address numerous access control
challenges including those associated with database and application consolidation,
privacy laws and regulatory compliance requirements. When a user requests data, the database looks at the user credentials and roles that they have access to and filters the results that the user sees from a query. Label Security is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. Note that this is different than data redaction. With redaction, data is scrambled when data is copied. With Label Security, the data is not returned if the user does not have rights to read the data. An error is not returned from a query but a null value is returned if the user does not have rights to read a column. The biggest benefit to this option it does not require program changes to restrict access to data and present results to users. If, for example, we are going to show sales in a customer relationship program, we don’t need to change the code based on the user being a sales rep or sales manager. The sales manager can see all of the sales rep information to track how their team is performing. Each sales rep can see their data but not the other sales rep data. It is important to note that Label Security is not allowed in Amazon RDS and you must use EC2 or another cloud service to use this feature.

Multitenant

Multitenant or Pluggable Database allows you to consolidate instances onto one server and reduce your overall management cost. The many pluggable databases in a single multitenant container database share its memory and background processes. This enables consolidation of many more pluggable databases compared to the old architecture, offering similar benefits to schema-based consolidation but with none of the major application changes required by that approach. Backups are done at the parent layer. Users are provisioned at the pluggable layer. Features of the instance (RAC, DataGuard, etc) are inherent to the parent and adopted by the pluggable container. To take a test system from single instance to data guard replicated only requires unplugging the database from the single instance system and plugging it into a data guard system. The same is true for RAC and all other features. Multitenant is licensed at $350 per user or $17,500 per processor. This come in at $685 per processor per month. It is important to note that this option is not available on Amazon RDS. This option is specifically disabled and not allowed. You must run this on EC2 to use this functionality or on another cloud platform.

Audit Vault

Audit Vault and Database Firewall monitors Oracle and non-Oracle database traffic to detect and block threats, as well as improves compliance reporting by consolidating audit data from databases, operating systems, directories, and other sources. Audit vault is licensed at $6,000 per processor and is not available on a per user basis. This comes in at $235 per processor per month. This option typically requires a separate server for security reasons where logs and logging information is copied to prevent data to be manipulated on a single system and the auditing system.

Database Vault

Database Vault reduces the risk of insider and outsider threats and addresses common compliance requirements by preventing privileged users (DBA) from accessing sensitive application data,
preventing compromised privileged users accounts from being used to steal sensitive data or make unauthorized changes to databases and applications,
providing strong controls inside the database over who can do what and controls over when and how applications, data and databases can be accessed,
providing privilege analysis for all users and applications inside the database to help achieve least privilege model and make the databases and applications more secure. Database Vault is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. It is important to note that this option is not available on Amazon RDS. This option is specifically disabled and not allowed. You must run this on EC2 to use this functionality or on another cloud platform.

Real Application Testing

Real Application Testing helps you fully assess the effect of such system changes on real-world applications in test environments before deploying the change in production. Oracle Real Application Testing consists of two features, Database Replay and SQL Performance Analyzer. Together they enable enterprises to rapidly adopt new technologies that add value to the business while minimizing risk. Traces can be recorded for reads and writes and replayed on a test system. This makes the replay option perfect for development and testing instances. The product is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. It is important to note that not having the sys level access might or might not break this feature in Amazon RDS based on what you are trying to replay.

OLAP

Online Analytics Processing or OLAP is a multidimensional analytic engine embedded in Oracle Database 12c. Oracle OLAP cubes deliver sophisticated calculations using simple SQL queries – producing results with speed of thought response times. This outstanding query performance may be leveraged transparently when deploying OLAP cubes as materialized views – enhancing the performance of summary queries against detail relational tables. Because Oracle OLAP is embedded in Oracle Database 12c, it allows centralized management of data and business rules in a secure, scalable and enterprise-ready platform. OLAP is licensed at $460 per user or $23,000 per processor. This comes in at $901 per processor per month. This feature is good for BI Analytics packages and Data Warehouse systems.

Spatial and Graphics

Spatial and Graphics supports a full range of geospatial data and analytics for land management and GIS, mobile location services, sales territory management, transportation, LiDAR analysis and location-enabled Business Intelligence. The graph features include RDF graphs for applications ranging from semantic data integration to social network analysis to linked open data and network graphs used in transportation, utilities, energy and telcos and drive-time analysis for sales and marketing applications. This option is licensed at $350 per user or $17,500 per processor. This come in at $685 per processor per month. It is important to note that this option is not supported in Amazon RDS. You must select EC2 or another cloud service to get this option.

All of the above options are bundled into the High Performance Edition. If we add up all of the options we get a total of

  • Transparent Data Encryption – $587.50 per month
  • Diagnostics – $294 per month
  • Tuning – $196 per month
  • Partitioning – $450 per month
  • Advanced Compression – $450 per month
  • Advanced Security – $587.50 per month
  • Data Guard – bundled
  • Label Security – $450 per month
  • Multitenant – $685 per month
  • Audit Vault – $235 per month
  • Database Vault – $450 per month
  • Real Application Testing – $450 per month
  • OLAP – $901 per month
  • Spatial and Graphics – $685 per month

This roughly bubbles up to $5,833.50 per processor per month for the High Performance options. Oracle bundles all of this for an additional $1000 per processor per month. The Extreme Performance Edition options include Active Data Guard, In Memory, and RAC.

Active Data Guard

Active Data Guard has the same features and functions as Data Guard but allows the target database to be open for read/write and updates happen bidirectionally. Active Data Guard is licensed at $230 per user or $11,500 per processor. This come in at $450 per processor per month.

In Memory

In Memory optimizes both analytics and mixed workload OLTP, delivering outstanding performance for transactions while simultaneously supporting real-time analytics, business intelligence, and reports. Most DBAs optimize performance by creating indexes to find data quicker. This works if you know the questions ahead of time. If you don’t know the question it is difficult to tune for everything. In Memory allows you to create a row based copy of the data as well as a column based copy of the data for quick column sorts and searches. In Memory is licensed at $460 per user or $23,000 per processor. This come in at $901 per month per processor. The key advantage of this option is that it prevents you from purchasing a second database to do analytics and reporting on the same box as your transactional system.

Real Application Clusters (RAC)

RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide highly scalable and available database solutions for all your business applications. Oracle RAC is a key component of Oracle’s private cloud architecture. Oracle RAC support is included in the Oracle Database Standard Edition for higher levels of system uptime and a critical part of the MAA strategy. RAC is licensed at $460 per user or $23,000 per processor. This come in at $901 per month per processor. It is important to note that RAC is not supported in Amazon or Azure. The system requires shared storage between compute instances that neither platforms provide. The only option for this configuration is Oracle DBaaS/PaaS.

The options for Extreme performance come in at $2,252 per processor per month but Oracle only charges an extra $1000 with Extreme Edition.

In Summary, there are a ton of options for the database. You need to figure out what options you need and if you need more than a couple it is economically beneficial to go with High Performance. If you need RAC, Active Data Guard, or In Memory you must purchase the Extreme Performance Edition. It is also important to note that not all features are supported in Amazon RDS and you must either go with Oracle Database as a Service or build a system using IaaS. RAC is the only exception where it is only available with Oracle DBaaS. We will go into a couple of these features in upcoming days to look at the value, how to use, and what is required to make the functionality work with other cloud providers.