aws provider vs vsphere provider

In a previous post we talked about the vsphere provider and what is needed to define a connection to create a virtual machine. In this blog we will start to look at what is needed to setup a similar environment to do the same thing in AWS EC2. Think of it as a design challenge. Your client comes to you and says “I want a LAMP or WAMP stack or Tomcat Server that I can play with. I want one local as well as one in the cloud. Can you make that happen?”. You look around and find out that they do have a vSphere server and figure out how to log into it and create a Linux instance to build a LAMP stack and a Windows instance to create a WAMP stack then want to repeat this same configuration in AWS, Azure, and/or Google GCP. Simple, right?

If you remember, to create a vSphere provider declaration in Terraform you basically need a username, password, and IP address of the vSphere server.

provider “vsphere” {
user = var.vsphere_user
password = var.vsphere_password
vsphere_server = var.vsphere_server
version = “1.12.0”

allow_unverified_ssl = true
}

The allow_unverified_ssl is to get around that most vSphere installations in a lab don’t have a certified certificate but a self-signed certificate and the version is to help us keep control of syntax changes in our IaC definitions that will soon follow.

The assumptions that you are making when connecting to a vSphere server when you create a virtual machine are

  1. Networking is setup for you. You can connect to a pre-defined network interface from vSphere but you really can’t change your network configuration beyond what is defined in your vSphere instance.
  2. Firewalls, subnets, and routing is all defined by a network administrator and you really don’t have control over the configuration inside Terraform unless you manage your switches and routers from Terraform as well. The network is what it is and you can’t really change it. To change routing rules and blocked or open ports on a network typically requires reconfiguration of a switch or network device.
  3. Disks, memory, and CPUs are limited by server configurations. In my home lab, for example, I have two 24 core servers with 48 GB of RAM on one system and 72 GB of RAM on the other. One system has just under 4 TB of disk while the other has just over 600 GB of disk available.
  4. Your CPU selection is limited to what is in your lab or datacenter. You might have more than just an x86 processer here and there but the assumption is that everything is x86 based and not Sparc or PowerPC. There might be an ARM processor as an option but not many datacenters have access to this unless they are developing for single board computers or robotics projects. There might be more advanced processors like a GPU or Nvidia graphics accelerated processor but again, these are rare in most small to midsize datacenters.

Declaring a vsphere provider gives you access to all of these assumptions. If you declare an aws or azure provider these assumptions are not true anymore. You have to define your network. You can define your subnet and firewall configurations. You have access to almost unlimited CPU, memory, and disk combinations. You have access to more than just an x86 processor and you have access to multiple datacenters that span the globe rather than just a single cluster of computers that are inside your datacenter.

The key difference between declaring a vsphere provider and an aws provider is that you can declare multiple aws providers and use multiple credentials as well as different regions.

provider “aws” {
version = “> 2”
profile = “default”
region = “us-east-1”
alias = “aws”
}

Note we don’t connect to a server. We don’t have a username or password. We do define a version and have three different parameters that we pass in. So the big question becomes how do we connect and authenticate? Where is this done if not in the provider connection? We could have gotten by with just provider “aws” {} and that would have worked as well.

To authenticate using the Hashicorp aws provider declaration you need to

  • declare the access_key and secret_key in the declaration (not advised)
  • declare the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY_ID as environment variables
  • or point to a configuration file with the shared_credentials_file declaration or AWS_SHARED_CREDENTIALS_FILE environment variable leveraging the profile declaration or PROFILE environment variable.
  • automatic loading of the ~/.aws/credentials or ~/.aws/config files

The drawback to using the environment variables is that you can only have one login into your aws console but can connect to multiple regions with the same credentials. If you have multiple accounts you need to declare the access_key and secret_key or the more preferred method of using the shared_credentials_file declaration.

For the aws provider, all parameters are optional. The provider is flexible enough to make some assumptions and connect to AWS based on environment variables and optional parameters defined. If something is defined with a parameter it is used over the environment variable. If you define both a key and a shared_credentials_file, Terraform will throw an error. If you have environment variables define and a ~/.aws/credentials file, the environment variables will be used first.

If we dive a little deeper into our vsphere variables.tf file we note that we need to run a script or manually generate the declarations for vsphere_datacenter, vsphere_host, and vsphere_resource_pool prior to defining a virtual machine. With the aws provider we only need to define the region to define all of these elements. Unfortunately, we also need to define the networking connections, subnet definitions, and potential firewall exceptions to be able to access our new virtual machine. It would be nice if we could take our simple vsphere virtual machine definition defined in our vsphere main.tf file and translate it directly into an aws_instance declaration. Unfortunately, there is very little that we can translate from one environment to the other.

The aws provider and aws_instance declaration does not allow us to clone an existing instance. We need to go outside of Terraform and create an AMI to use as a reference for aws_instance creation. We don’t pick a datacenter and resource_pool but select a region to run our instance. We don’t need to define a datastore to host the virtual machine disks but we do need to define the disk type and if it is a high speed (higher cost) or spinning disk (lower cost) to host the operating system or data.

We can’t really take our existing code and run it through a scrubber and spit out aws ready code unfortunately. We need to know how to find a LAMP, WAMP, and Tomcat AMI and reference it. We need to know the network configurations and configure connections to another server like a database or load balancer front end. We also need to know what region to deploy this into and if we can run these services using low cost options like spot instances or can shut off the running instance during times of the day to save money given that a cloud instance charges by the minute or hour and a vsphere instance is just consuming resources that you have already paid for.

One of the nice things about an aws provider declaration is that you can define multiple providers in the same file which generated an error for a vsphere provider. You can reference different regions using an alias. In the declaration shown above we would reference the provider with

provider = aws.aws

If we wanted to declare that the east was our production site and the west was our dev site we could use the declaration

provider “aws” {
version = “> 2”
profile = “default”
region = “us-east-1”
alias = “east”
}
provider “aws” {
version = “> 2”
profile = “default”
region = “us-west-1”
alias = “west”
}

If we add a declaration of a network component (aws_vpc) we can populate our state file and see that the changes were pushed to our aws account.

We get the .terraform tree populated for our Windows desktop environment as well as the terraform.tfstate created. Looking at our AWS VPC console we see that Prod-1 was created in US-East-1 (and we could verify that Dev-1 was created in US-West-1 if we wanted).

Note that the CIDR block was correctly defined as 10.0.0.0/16 as desired. If we run the terraform destroy command to clean up this vpc will be destroyed since it was created and is controlled by our terraform declaration.

Looking at our terraform state file we can see that we did create two VPC instances in AWS and the VPC ID should correspond to what we see in the AWS console.

In summary, using Terraform to provision and manage resources in Amazon AWS is somewhat easier and somewhat harder than provisioning resources in a vSphere environment. Unfortunately, you can’t take a variables.tf or main.tf declaration from vSphere and massage it to become a AWS definition. The code needs to be rewritten and created using different questions and parameters. You don’t need to get down to the SCSI target level with AWS but you do need to define the network connection and where and how the resource will be declared with a finer resolution. You can’t clone an existing machine inside of Terraform but you can do it leveraging private AMI declarations in AWS similar to the way that templates are created in vSphere. Overall an AWS managed state with Terraform is easy to start and allows you to create a similar environment to an on-premises environment as long as you understand the differences and cost implications between the two. Note that the aws provider declaration is much simpler and cleaner than the vsphere provider. Less is needed to define the foundation but more is needed as far as networking and how to create a virtual instance with AMIs rather than cloning.

The variables.tf and terraform.state files are available on github to review.

vsphere_virtual_machine creation

In a previous blog we looked at how to identify an existing vSphere virtual machine and add it as a data element so that it can be referenced. In this blog we will dive a little deeper and look at how to define a similar instance as a template then use that template to create a new virtual machine using the resource command.

It is important to note that we are talking about three different constructs within Terraform in the previous paragraph.

  • data declaration – defining an existing resource to reference it as an element. This element is considered to be static and can not be modified or destroyed but it does not exist, terraform will complain that the declaration failed since the element does not exist. More specifically, data vsphere_virtual_machine is the type for existing vms.
  • template declaration – this is more of a vSphere and not necessarily a Terraform definition. This defines how vSphere copies or replicates an existing instance to create a new one as a clone and not necessarily from scratch
  • resource declaration – defining a resource that you want to manage. You can create, modify, and destroy the resource as needed or desired with the proper commands. More specifically, resource vsphere_virtual_machine is the type for new or managed vms.

We earlier looked at how to generate the basic requirements to connect to a vSphere server and how to pull in the $TF_VAR_<variable> label to connect. With this we were able to define the vspher_server, vsphere_user, and vspher_password variable types using a script. If we use the PowerCLI module we can actually connect using this script using the format

Connect-VIServer -Server $TF_VAR_vsphere_server -User $TF_VAR_vsphere_user -Password $TF_VAR_vsphere_password

This is possible because if the values do not exist then they are assigned in the script file. From this we can fill in the following data

  • vsphere_datacenter from Get-Datacenter
  • vsphere_virtual_machine (templates) from Get-Template
  • vsphere_host from Get-Datacenter | Get-VMHost
  • vsphere_datastore from Get-Datastore

The vsphere_datacenter assignment is relatively simple

$connect = Connect-VIServer -Server $TF_VAR_vsphere_server -User $TF_VAR_vsphere_user -Password $TF_VAR_vsphere_password

$dc = Get-Datacenter
Write-Host ‘# vsphere_datacenter definition’
Write-Host ‘ ‘
Write-Host -Separator “” ‘data “vsphere_datacenter” “dc” {
name = “‘$dc.Name'”‘
‘}’
Write-Host ‘ ‘

This results in an output that looks like…

# vsphere_datacenter definition

data “vsphere_datacenter” “dc” {
name = “Home-lab”
}

This is the format that we want for our parameter.tf file. We can do something similar for the vm templates

Write-Host ‘# vsphere_virtual_machine (template) definition’
Write-Host ‘ ‘
$Template_Name = @()
$Template_Name = Get-Template

foreach ($item in $Template_Name) {
Write-Host -Separator “” ‘data “vsphere_virtual_machine” “‘$item'”‘ ‘ {
name = “‘$item'”‘
‘ datacenter_id = data.vsphere_datacenter.dc.id
}’
Write-Host ‘ ‘
}
Write-Host ‘ ‘

This results in the following output…

#vsphere_virtual_machine (template) definition

data “vsphere_virtual_machine” “win_10_template” {
name = “win_10_template”
datacenter_id = data.vsphere_datacenter.dc.id
}

data “vsphere_virtual_machine” “win-2019-template” {
name = “win-2019-template”
datacenter_id = data.vsphere_datacenter.dc.id
}

We can do similar actions for vsphere_host using

$Host_name = @()
$Host_name = Get-Datacenter | Get-VMHost

as well as vsphere_datastore using

$Datastore_name = @()
$Datastore_name = Get-Datastore

The resulting output is a terraform ready parameter file that represents the current state of our environment. The datacenter, host, and datastores should not change from run to run. We might define new templates so these might be added or removed but this script should be good for generating the basis of our existing infrastructure and give us the foundation to build a new vsphere_virtual_machine.

To create a vsphere_virtual_machine we need the following elements

  • name
  • resource_pool_id
  • disk
    • label
  • network_interface
    • network_id

These are the minimum requirements required by the documentation and will allows you to pass the terraform init but the apply will fail. Additional values that are needed

  • host_system_id – host to run the virtual machine on
  • guest_id – identifier for operating system type (windows, linux, etc)
  • disk.size – size of disk
  • clone.template_uuid – id of template to clone to create the instance.

The main.tf file that works to create our instance looks like

data “vsphere_virtual_machine” “test_minimal” {
name = “esxi6.7”
datacenter_id = data.vsphere_datacenter.dc.id
}

resource “vsphere_virtual_machine” “vm” {
name = “terraform-test”
resource_pool_id = data.vsphere_resource_pool.Resources-10_0_0_92.id
host_system_id = data.vsphere_host.Host-10_0_0_92.id
guest_id = “windows9_64Guest”
network_interface {
network_id = data.vsphere_network.VMNetwork.id
}
disk {
label = “Disk0”
size = 40
}
clone {
template_uuid = data.vsphere_virtual_machine.win_10_template.id
}
}

The Resources-10_0_0_92, Host-10_0_0_92, and win_10_template were all generated by our script and we pulled them from the variables.tf file after it was generated. The first vm “test_minimal” shows how to identify an existing virtual_machine. The second “vm” shows how to create a new virtual machine from a template.

The files of interest in the git repository are

  • connect.ps1 – script to generate variables.tf file
  • main.tf – terraform file to show example of how to declare virtual_machine using data and resource (aka create new from template)
  • variables.tf – file generated from connect.ps1 script after pointing to my lab servers

All of these files are located on https://github.com/patshuff/terraform-learning. In summary, we can generate our variables.tf file by executing a connext.ps1 script. This script generates the variables.tf file (test.yy initially but you can change that) and you can pull the server, resource_pool, templates, and datastore information from this config file. It typically only needs to be run once or when you create a new template if you want it automatically created. For my simple test system it took about 10 minutes to create the virtual machine and assign it a new IP address to show terraform that the clone worked. We could release earlier but we won’t get the IP address of the new virtual instance.

Terraform Providers

One of the foundational components of automation is being able to speak in the language of your target. With AWS, for example, CloudFormation is a good tool to define what a deployment in AWS should look like and ensures conformity to the design definition. The main problem is that CloudFormation only works on AWS and does not work on other deployment platforms. Terraform, on the other hand, performs the same automation from a configuration definition and creates the desired components onto a variety of platforms. The mechanism used to perform this function is the inclusion of a provider definition. If you think in terms of Java or C programming a provider is a set of library functions that can be called and including a provider definition is similar to a include statement to pull in a library header.

Some good blogs that compare and contrast Terraform vs CloudFormation include:

If you look at the definition of a provider from HashiCorp on their Providers page it defines a provider as a way to expose the API interface of the backend system as well as tasks that might be needed like random number generation utilities to generate names. The Terraform Registry includes a list of providers and systems that Terraform can interface with. Checking the public cloud box provides us with a list of various cloud hosting targets that we will focus on in later blogs.

For the purpose of this blog we will dive into the VMware vSphere provider to get an understanding of how to call it, what happens when you call it, and what constructs are needed when you call it. In a previous blog we compared the vSphere provider to the AWS provider on a very high level to talk about the format differences between providers. In this blog we will dive deeper into the vSphere provider to help understand how to deploy it in a development, production, and disaster recovery scenario.

Selecting the vsphere provider and clicking on the USE PROVIDER button at the top right it shows that you can call the provider with either the required_providers or provider command structures. We will use the simplest example by calling only

provider “vsphere” { }

Looking at the documentation there are a variety of optional and required parameters that are needed inside the curly brackets.

The parameter options that we need for the provider definition include (taken straight from the hashicorp page):

  • user – (Required) This is the username for vSphere API operations. Can also be specified with the VSPHERE_USER environment variable.
  • password – (Required) This is the password for vSphere API operations. Can also be specified with the VSPHERE_PASSWORD environment variable.
  • vsphere_server – (Required) This is the vCenter server name for vSphere API operations. Can also be specified with the VSPHERE_SERVER environment variable.
  • allow_unverified_ssl – (Optional) Boolean that can be set to true to disable SSL certificate verification. This should be used with care as it could allow an attacker to intercept your auth token. If omitted, default value is false. Can also be specified with the VSPHERE_ALLOW_UNVERIFIED_SSL environment variable.
  • vim_keep_alive – (Optional) Keep alive interval in minutes for the VIM session. Standard session timeout in vSphere is 30 minutes. This defaults to 10 minutes to ensure that operations that take a longer than 30 minutes without API interaction do not result in a session timeout. Can also be specified with the VSPHERE_VIM_KEEP_ALIVE environment variable.

For security sake it is recommended to hide user and password information in a different file from the definition or have it as environment variables in the shell to pass into terraform. In this example we will create two files, variables.tf and main.tf to simple call the provider definition and look at the constructs that are created by terraform.

The main.tf file looks like

provider “vsphere” {
user = var.vsphere_user
password = var.vsphere_password
vsphere_server = var.vsphere_server
version = “1.12.0”

allow_unverified_ssl = true
}

Note the use of var.<something> to pull in the definition of an externally defined variable. This could be done with a second file or with environment variables. For a variables.tf file we could enter

variable “vsphere_user” {
type = string
default = “administrator@vsphere.local”
}

variable “vsphere_password” {
type = string
default = “NotTheRIghtPassword”
}

variable “vsphere_server” {
type = string
default = “10.0.0.72”
}

In the variables.tf file we define three string values and include a default value to pre-define what the variable should be defined as. If we open up a PowerShell windows (or Terminal on Linux) we can see that there are only the variables.tf and main.tf files in the directory.

Note that we are using PowerShell 7 as the command line interface so that we can test out the connection to our vSphere server using PowerCLI commands to verify variable definitions.

If we type

terraform init

The hashicorp/vsphere provider data is pulled from the web and placed in the .terraform subdirectory.

Looking at the .terraform directory it contains a grouping of libraries that we can call from our main.tf definition file.

If we use the tree command we can see the nested structure and note that there is a selections.json file at the plugins and a terraform-provider-vsphere_v1.12.0_x4.exe at the windows_amd64 subdirectory

What the init command did was find out what platform we are running on and pulled down the appropriate binary to translate terraform modules and resource calls into API calls into vSphere. For our example we will make API calls into our vSphere server located at 10.0.0.72 as administrator@vsphere.local with the given password. The selections.json file contains a hash value that is used to test the binary integrity of the terraform-provider-vsphere_v1.12.0_x4.exe and download a new version if needed next time the init command is issued.

At this point we can call the

terraform plan

command to test our main.tf and variables.tf configurations. Everything should work because the syntax is simple so far.

Note that we don’t have a state file defined yet. This should happen when we type

terraform apply

Once we execute this command we get a terraform.tfstate file locally that contains the state information of the current server. Given that we have not made any resource definitions, data declarations, or module calls we don’t have any need to connect to the server. The tfstate file generated is relatively simple.

{
“version”: 4,
“terraform_version”: “0.13.3”,
“serial”: 1,
“lineage”: “f35a4048-4cee-63e0-86b2-e699165efbe5”,
“outputs”: {},
“resources”: []
}

If we included something simple like a datacenter definition the connection will fail with the wrong password.

Putting in the right password but the wrong datacenter will return a different value

To get the right datacenter we can go to the vSphere html5 user interface or use the Connect-VIserver command to look for the datacenter name.

In this example we should use the Home-Datacenter as the Datacenter name.

It is important to note that the tfstate file changes with the successful apply and the resources section now contains valid data about our server.

{
“version”: 4,
“terraform_version”: “0.13.3”,
“serial”: 2,
“lineage”: “f35a4048-4cee-63e0-86b2-e699165efbe5”,
“outputs”: {},
“resources”: [
{
“mode”: “data”,
“type”: “vsphere_datacenter”,
“name”: “dc”,
“provider”: “provider[\”registry.terraform.io/hashicorp/vsphere\”]”,
“instances”: [
{
“schema_version”: 0,
“attributes”: {
“id”: “datacenter-3”,
“name”: “Home-Datacenter”
}
}
]
}
]
}

In summary, we have looked at how to find various providers to use with terraform, how to call a sample provider and what constructs are created when the init, plan, and apply functions are used with the local terraform binary. Fortunately, none of this changes if you are using Windows, Linux, or any other operating system. The provider directory under the .terraform tree contains the binary to translate from local API calls to API calls on the target system. This is a simple example but gives a good overview of what a good and bad connection into a vSphere server looks like and how to troubleshoot the connection. This construct should also work for a direct connection into an ESXi server without having to spin up a vSphere management instance.

Supporting multiple providers

One of the key uses of Terraform is to deploy development and production systems. Terraform can be used to manage what is deployed, manage resources, and restrict resources available to an instance. In our last blog entry we looked at the vSphere provider and looked at some of they key parameters that are needed to deploy solutions into this virtual environment.

In a perfect world we should be able to develop definitions to deploy development systems to a small or older system, deploy production to a more expensive and powerful vSphere cluster, and a disaster recovery copy to make sure that we can failover to an alternate datacenter in times of emergency. We should then be able to take the data for this provider and move it to Amazon AWS or Microsoft Azure or Google GCP by just changing the provider. Unfortunately, this is not a perfect world and there are a ton of reasons that this won’t work.

If we look at the documentation for the AWS provider we note that we don’t need a username and password or IP address but rather need a public and private key to connect to an AWS serviced and these parameters can be provided by command line environment variables. We can also define multiple providers and give an alias for the multiple providers and deploy services into different accounts, regions, and zones based on the terraform provider definition.

A typical aws provider main.tf file looks like…

provider “aws” {
version = “> 2”
profile = “default”
region = var.dev_location
alias = “dev”
}

provider “aws” {
version = “> 2”
profile = “default”
region = “us-west-1”
alias = “prod”
}

allowing you to deploy resources into “aws.dev” or “aws.prod” with a variable.tf file containing nothing or

variable “createdby” {
type = string
default = “TechEnablement”
}

variable “environment” {
type = string
default = “TechEnablement”
}

variable “dev_location” {
type = string
default = “us-east-1”
}

With this variable.tf definition you need to define environment variables to define the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY or define shared_credentials_file in a terraform configuration file to point to the location of a key file. On Windows this is typically “%USERPROFILE%\.aws\credentials”. The format of the credentials file looks like

[default]

aws_access_key_id=AWSSAMPLE7EXAMPLE

aws_secret_access_key=long/keywith/numbers4&letters

Unfortunately the vSphere provider does not allow for an alias tag and use of different account credentials and vsphere host address. Rather than defining multiple providers you need to define different directories and different variable.tf and main.tf files for each of the environments. In our earlier example we would have a dev, prod, and dr folder under our main folder. Each folder would have terraform configuration files to define what each environment would look like and resources available.

A typical multi-environment tree would look different from out initial single tree deployment with a dev, prod, and dr folder each containing the same main.tf files but different variable.tf definitions. Each folder would have their own terraform.tfstate file as well given that there are different environment variables and states on different servers.

If you try to define multiple vsphere providers in one file you get the error

Given the differences between the two provider types it begs the question of changing the aws provider to the same file format as the vsphere provider and have three different folders that deploy different environments to different servers. This would work but having everything in one file reduces complexity and potential errors by having multiple copies in multiple folders. Editing one does not guarantee changes to the other directories and there might be subtle differences between the different environments, like datastore names or locations as well as network definitions, that are unique to each environment.

In summary, there are multiple ways of solving the same problem. The ultimate solution is to write a generic provider that can deploy services into vSphere, Hyper-V, Nutanix, other on-premises virtual machine hosts, AWS, Azure, Google GCP, and other cloud virtual machine hosts. Given that there is no generic provider that works across all or even multiple environments you have to decide how to deploy multiple terraform configuration files to multiple target locations without doubling or tripling your work and code that needs support and maintenance. My recommendation is to go with different folders for different environments and have different variable.tf and main.tf files in each folder.