Installing Apcera on AWS

This document describes how to install and configure Terraform in preparation for installing the Apcera Platform on AWS.

AWS Cluster Description

The reference cluster installed on AWS as described here makes use of the following resources:

  • 14 EC2 instances
  • 1 ELB
  • 4 Elastic IPs
  • 6 Security Groups
  • 3 Availability Zones
  • 8 volumes (in addition to host root disks)
  • 3 subnets

For reference, see also the following topics:

AWS Installation Prerequisites

Before you install Apcera on AWS you must complete the following prerequisites.

Prerequisite Description
Create IAM keys To provision AWS resources, you will need to provide the access_key and secret_key for an authorized IAM user.
Select AWS region To provision AWS resources using the default configuration, you will need to specify an aws_region with 3 availability zones (AZs) where you want to deploy the cluster.
Upload public SSH key to AWS To access the Orchestrator host and other cluster hosts remotely, you will need to generate a SSH public/private key pair and upload the public key to AWS.
Configure Google Auth For cluster access via APC and the web console, you will need to create a Google Auth project and generate the keys for client_id, client_secret, web_client_id.
Generate SSL certificate chain and key HTTPS is recommended for production clusters.
Registered domain name To deploy the cluster you will need to have a registered domain_name so you can update DNS records with the address of the ELB for the HTTP routers and IP address of the monitoring host.
Install Terraform You will need Terraform version 0.7.4 or later to provision AWS resources.
Install Ruby You will need Ruby to generate the cluster.conf file that is used to deploy a cluster.

Create AWS Resources

  1. Verify Terraform version 0.7.4 or later.

    Run command terraform version to verify that you are using Terraform version 0.7.4 or later.

    If necessary install Terraform version 0.7.4 or later.

    When you run Terraform commands as described below, a local state file is created (terraform.tfstate) that maintains the record of the resources created. For production clusters it is recommended that you store the state remotely.

  2. Download and unzip the AWS installation files.

    Get the installation files from Apcera Support.

    Unzip the file contents to a working directory, such as apcera-aws-mpd.

    Copy this directory to a known location, such as $HOME/apcera-aws-mpd.

    These files contain configuration information for deploying a minimum production cluster (MPD) on AWS. As instructed below, you will update portions of main.tf, terraform.tfvars, and cluster.conf.erb to deploy your cluster.

  3. Load the Terraform modules.

    The terraform-module subdirectory includes the apcera/aws and apcera/aws/ami-copy modules which define the infrastructure for AWS. The main.tf file references these modules using local relative paths.

    CD to the working directory where you extracted the installation files and run the terraform get command. This command caches in the working directory the modules used by this particular Terraform configuration.

    If you receive an error running terraform get, edit main.tf so that the source entry for both modules point to the local path where you have placed the modules. Such as:

     module "apcera-aws" {
       source = "Users/user/aws_example/terraform-module/apcera/aws"
    
     module "ami-copy" {
       source = "Users/user/aws_example/terraform-module/apcera/aws/ami-copy/"
    

    Run terraform get and verify that the Terraform modules are loaded.

  4. Edit the terraform.tfvars file.

    Populate the following parameter values:

    Parameter Value Description
    key_name "SSH public key name that you uploaded to AWS" Enter the EC2 key pair name you specified when you uploaded your public SSH key to AWS.
    aws_region "your-preferred-aws-region" Enter the AWS Region, such as "us-west-2".
    az_primary "a" Primary subnet availability zone (AZ). You may need to adjust this if the AZ does not support the requested EC2 instance type.
    az_secondary "b" Secondary subnet AZ. You may need to adjust this if the AZ does not support the requested EC2 instance type.
    az_tertiary "c" Tertiary subnet AZ. You may need to adjust this if the AZ does not support the requested EC2 instance type.
    access_key "REDACTED" Enter your AWS IAM access key.
    secret_key "REDACTED" Enter your AWS IAM secret key.
    cluster_name "your-cluster-name" Enter a unique cluster name using alphanumeric characters.
    monitoring_db_master_password "EXAMPLE_PASSWORD" Enter a password for the monitoring DB.
    rds_postgres_db_master_password "EXAMPLE_PASSWORD" Enter a password for the component DB.
    gluster_per_AZ "0" Leave the default "0" unless you are using Gluster, in which case set it to 3.

    NOTE: Each password must be 8 characters or more and cannot have special characters "@", "/", or "".

  5. Prevent a singleton from being built (optional).

    The aws_example Terraform files will build a singleton host that is not required or used. Optionally, you can update the TF files so that this host is not built.

    In terraform.tfvars, add the following line at the bottom.

     singleton-count = "0"
    

    In variables.tf, add the following line at the bottom.

     variable "singleton-count" {}
    

    In main.tf, in the module "apcera-aws" section, add the following line at the bottom.

     singleton-count = "${var.singleton-count}"
    
  6. Run the terraform plan command.

    This command displays the changes Terraform will attempt.

    Using the default configuration, you should see the result: Plan: 65 to add, 0 to change, 0 to destroy.

  7. Run the terraform apply command.

    Use the terraform apply command to apply and run the changes. This command may take some time to complete.

    NOTE: If you receive an error, review the error message and troubleshoot accordingly. Some errors may only require that you simply run the terraform apply again. Note that Terraform does not roll back created resources. If you need to make edits to the Terraform files, repeat the plan and apply commands. Run terraform refresh if you need to update the resource state.

  8. Verify creation of AWS resources.

    When the terraform apply command completes successfully, run the following command to display the resources created:

     terraform output
    

    You should see the output showing all AWS resources that were created by Terraform. You can also use terraform refresh. See Terraform commands for a complete list of commands.

    At this point you can log in to the AWS Console for your account. You should see the resources that were created, including several EC2 instances, volumes, elastic load balancers, and security groups.

Configure Cluster Deployment

Now that the AWS infrastructure is created, the next step is to configure the deployment.

  1. Edit the cluster.conf.erb file.

    The cluster.conf.erb file is used to generate the cluster.conf file.

    Section Description
    provisioner Specifies information related to the creation of the machines that will run within the cluster.
    machines Defines the various "zones" within the cluster, the machines that belong to the zone, and the roles within the cluster that are allowed to be assigned to those machines.
    components Specifies the desired number of each of the component types. Changes here will either find a new place to run components or scale the cluster down if the numbers are decreased.
    chef Configures the cluster and base domain names, the ID provider and users, SSL for HTTPS and cluster component monitoring.
  2. Verify the provisioner is generic.

    The provisioner specifies information related to the creation of the machines that will run within the cluster. The generic provisioner uses IP addresses to identify the infrastructure.

     provisioner {
       type: generic
     }
    
  3. Update the machines section if necessary.

    The machines section defines the various machine types within the cluster, the hosts that belong to that type, and the roles within the cluster that are allowed to be assigned to those machines

    Refer to the configuration documentation if you want to change machines values.

    For example, you may want to comment out the entire Gluster block if you are not using Gluster. You may also want to comment out the IP Manager.

     machines: {
       auditlog: {
         # TERRAFORM OUTPUT: auditlog-addresses
         <%= capture_or_die('terraform output auditlog-addresses') %>
         suitable_tags: [ "auditlog-database" ]
       }
    
       central: {
         # TERRAFORM OUTPUT: central-addresses
         <%= capture_or_die('terraform output central-addresses') %>
         suitable_tags: [
           "component-database"
           "api-server"
           "job-manager"
           "router"
           "package-manager"
           "stagehand"
           "cluster-monitor"
           "health-manager"
           "metrics-manager"
           "nats-server"
           "events-server"
           "auth-server"
           "basic-auth-server"
           "google-auth-server"
           "app-auth-server"
           "kv-store"
           "vault"
         ]
       }
    
       instance_manager: {
         # TERRAFORM OUTPUT: instance-manager-addresses
         hosts: [
                 # TERRAFORM OUTPUT: instance-manager-addresses
                 <%= capture_or_die('terraform output instance-manager-addresses') %>,
                ]
         suitable_tags: [
           "instance-manager"
         ]
       }
    
       # Uncomment if using Gluster
       # gluster: {
         # TERRAFORM OUTPUT: gluster-addresses
         # <%= capture_or_die('terraform output gluster-addresses') %>
         # suitable_tags: [
         #  "gluster-server"
         # ]
       # }
    
       metricslogs: {
         # TERRAFORM OUTPUT: metricslogs-address
         <%= capture_or_die('terraform output metricslogs-address') %>
         suitable_tags: [
           "graphite-server"
           "redis-server"
         ]
       }
    
       # Uncomment if using IP Manager.
       # ip_manager: {
         # TERRAFORM OUTPUT: ip-manager-address
         # <%= capture_or_die('terraform output ip-manager-address') %>
         # suitable_tags: [
         #  "ip-manager"
         # ]
       # }
    
       # TCP Router is on a dedicated host so that it has it own public IP.
       tcp_router: {
         # TERRAFORM OUTPUT: tcp-router-address
         <%= capture_or_die('terraform output tcp-router-address') %>
         suitable_tags: [
           "tcp-router"
         ]
       }
    
       monitoring: {
         # TERRAFORM OUTPUT: monitoring-address
         <%= capture_or_die('terraform output monitoring-address') %>
         suitable_tags: [
           "monitoring"
         ]
       }
    
       # Default NFS singleon. Comment if using Gluster HA NFS.
       # Gateway.
       nfs: {
         # TERRAFORM OUTPUT: nfs-address
         <%= capture_or_die('terraform output nfs-address') %>
         suitable_tags: [ "nfs-server" ]
       }
     }
    
  4. Update the components counts if necessary.

    Refer to the configuration documentation if you want to change these values. The components section specifies the desired number of each of the component types. Changes here will either find a new place to run components or scale the cluster down if the numbers are decreased.

     components: {
               monitoring: 1
    
       component-database: 3
               api-server: 3
              job-manager: 3
                   router: 3
          package-manager: 3
           health-manager: 3
          metrics-manager: 3
              nats-server: 3
            events-server: 3
          cluster-monitor: 1
              auth-server: 3
        basic_auth_server: 3
       google_auth_server: 3
          app-auth-server: 3
                 kv-store: 3
                    vault: 3
    
        auditlog-database: 2
    
       # Uncomment if using Gluster.
         #  gluster-server: 3
    
         instance-manager: 3
    
               tcp-router: 1
       #       ip-manager: 1
          graphite-server: 1
             redis-server: 1
    
               nfs-server: 1
                stagehand: 1
     }
    
  5. Specify the cluster_name and base_domain.

    In the chef.continuum section provide a unique cluster_name and base_domain for which you have set up a DNS record. For example:

     chef: {
       "continuum": {
         "cluster_name": "example",
         "base_domain": "example.mycompany.com",
    
  6. Specify the Package Manager S3 endpoint.

    Change the s3_store.endpoint value (default is "endpoint": "s3.amazonaws.com") to point to the S3 endpoint for your region.

    For example, if your AWS region is us-west-1, the s3_store.endpoint is as follows:

     "endpoint": "s3-us-west-1.amazonaws.com",
    

    NOTE: If you are using the us-east-1 region you do not need to change this.

  7. Optionally, configure HTTPS.

    By default HTTPS is disabled (chef.continuum.router.https_port.ssl is disabled.

    If this is a production cluster, you will need to enable HTTPS by uncommenting this section the adding the SSL certificate chain and key.

    This is how the ssl entry should be formatted. Note that each closing parenthesis must be on its own line.

     chef: {
       "continuum": {
         "router": {
           "http_port": 8080,
           "https_port": 8181,
            "ssl": {
              "enable": true,
               "tlshosts": [
                 {
                   "server_names": [ "*.example.com" ],
                   "certificate_chain": (-----BEGIN CERTIFICATE-----
                     LONGSTRING
                     -----END CERTIFICATE-----
                     )
                   "private_key": (-----BEGIN RSA PRIVATE KEY-----
                     LONGSTRING
                     -----END RSA PRIVATE KEY-----
                     )
                },
              ]    # tlshosts
            }      # ssl
         },       # router
       }
    

    If you have an existing SSL cert and key, see configuring HTTPS for guidance on adding it cluster.conf.erb.

    If necessary you can generate an SSL cert and key.

    By default the Terraform module assumes that you are using HTTPS. If you do not want to use HTTPS, after deployment you update the ELB in the AWS console to use HTTP port 8080 as shown below.

    screenshot

    Note that the installation instructions explain how to do this, so you don't have to do anything now to disable HTTPS, just be aware that it is the default.

  8. Add your public SSH key.

    In the chef.continuum.ssh section, add your SSH public key.

     chef: {
       "continuum": {
         "ssh": {
           "custom_keys":[
             # Name and contanct in for this key here
             "ssh-rsa LONGSTRING"
           ]
         },
       }
    
  9. Configure cluster authentication.

    In the chef.continuum.auth_server section of cluster.conf.erb, configure the identity provider and users.

    By default a cluster uses Google Device auth which allows the defined gmail user(s) access to the cluster via APC. To access the cluster via the web console, you will need to also include an identity provider.

    The following configuration example uses the default Google Device auth and adds Basic Auth.

    Configure Google Device auth by adding your gmail address to the google.users section, replacing "your-gmail-address@gmail.com" with your actual address. You should also add this email address to the auth_server.admins section to give this user admin policy.

    Basic Auth is enabled by adding the "basic" section shown below, and including NAME@apcera.me in the admins section.

    If you want to enable [Google Auth](/config/auth-google/, provide the client_id, client_secret, and web_client_id for a Google App project.

     chef: {
       "continuum": {
         ...
         "auth_server": {
           "identity": {
             "default_provider": "basic",
    
             # Configuration for Google OAuth
             "google": {
               "enabled": false
    
               "users": [
                 "your-gmail-address@gmail.com",
               ],
               "client_id": "690542023564-abcdefghbqrgpnopqrstuvwxyz.apps.googleusercontent.com"
               "client_secret": "byS5RFQsKqXXXbbqENhczoD"
               "web_client_id": "690542023564-abcdefghijklmnopqrstuvwxyz.apps.googleusercontent.com"
             },
             "basic": {
               "enabled": true,
               "users": [
                 {
                   "name": "admin",
                   "password": "PaSsWoRd!"
                 }
               ]
             }
           },
           "admins": [
             "your-gmail-address@gmail.com",
             "admin@apcera.me"
           ]
           "apcera_ops": []
         },
       },
     }
    

    Basic Auth is for demonstration and development purposes and is not supported for production clusters.

  10. Configure Monitoring.

    Enter passwords for the monitoring guest user (chef.apzabbix.guest.pass) and admin user (chef.apzabbix.admin.pass).

    Enter the cluster name and domain for the apzabbix.web_hostnames parameter.

    See Monitoring Your Cluster for guidance on configuring this section.

Deploy Apcera Platform to AWS

At this point you can now deploy the cluster.

  1. Generate the cluster.conf file.

    Run the following Ruby command to generate the cluster.conf file:

     erb cluster.conf.erb > cluster.conf
    

    This command uses the cluster.conf.erb file in the cluster directory to generate the cluster.conf file, which is used to deploy the cluster. If successful this command should exit silently.

    Verify that the generated cluster.conf file is output to your cluster directory. If you encounter an error, run the erb command again.

  2. SSH to Orchestrator as root.

    First, run the following command to get the Orchestrator IP address:

     terraform output orchestrator-public-address
    

    Using the SSH key you configured, SSH to the Orchestrator host.

     ssh -A root@52.71.173.49
    

    Type yes to confirm the remote connection.

    You should be connected, indicated by root@<cluster-name>-orchestrator:~#.

  3. Update the Orchestrator OS kernel and orchestrator-cli

    Run the following command:

     apt-get update && apt-get dist-upgrade
    

    This command will update the Orchestrator host OS and also perform the update on orchestrator-cli bringing it up to the latest version.

  4. Reboot the Orchestrator host.

    This can be accomplished by running reboot.

    Run the uname -r command to see the current running kernel.

    Run orchestrator-cli version and verify that Orchestrator is updated.

  5. Copy the SSH key to the orchestrator user.

     cd /etc/ssh/userauth
    
     cat root > orchestrator
    
     chown orchestrator: orchestrator
    
     chmod 600 orchestrator
    

    Delete the ubuntu SSH key.

     rm ubuntu
    

    Change permissions on /etc/ssh/userauth

     chmod 755 /etc/ssh/userauth/
    

    Use ls -ld /etc/ssh/userauth/ to verify.

  6. Disable orchestrator user password.

    In the /etc/shadow file, modify the orchestrator user's encrypted password to *.

     grep orchestrator /etc/shadow
    

    The output should show the encrypted password for the orchestrator user. For example:

     orchestrator:$6$sf.w91gW$gS1QqmJtCvbx/UE.8yITZlnjOLPN1OYUvv92Fz5Hp3C1Iq08qk3K8cx4svg1q6Lsl5wMlGfFPsvqiS9eBA.N60:16876:0:99999:7:::
    

    Modify the orchestrator entry by replacing the encrypted password with * using your preferred text editor. For example: vi /etc/shadow.

    Use grep orchestrator /etc/shadow to verify that the orchestrator user has * in the password field. For example:

     orchestrator:*:16876:0:99999:7:::
    

    Type exit to log out the SSH session.

  7. Log in as orchestrator with agent forwarding.

    Verify that you can log in as the orchestrator user (with the password you used for your SSH key):

     ssh -A orchestrator@52.71.173.49
    

    You should be connected, indicated by orchestrator@ip-10.0.0.187:~$.

    Once verified exit the log in.

  8. Upload cluster.conf to Orchestrator.

    SCP cluster.conf to Orchestrator.

     scp cluster.conf orchestrator@52.71.173.49:
    

    If you see the message, “Are you sure you want to continue connecting (yes/no)?” Type “yes” to proceed.

  9. SSH into Orchestrator as the orchestrator user.

      ssh -A orchestrator@34.202.245.235
    

    Type ls and verify that cluster.conf is copied to the Orchestrator home directory.

  10. Initialize Orchestrator.

    Required for an initial deployment of a cluster:

    orchestrator-cli init
    
  11. Perform a dry run.

    orchestrator-cli deploy -c cluster.conf --update-latest-release --dry
    

    Performing a dry run verifies the syntax of cluster.conf. If the dry run is successful, a graph.png file is created. This is sufficient to verify the format of the configuration file.

    To view the deployment graph, exit the ssh session and run the following command to copy it to your local machine for viewing:

    scp orchestrator@34.202.245.235:graph.png ~/aws_example
    
  12. Deploy the cluster.

    Deploy the latest promoted release of the Apcera Platform.

    orchestrator-cli deploy -c cluster.conf --update-latest-release
    

    Successful deployment is indicated by the message "Done with cluster updates."

  13. Troubleshoot depoloyment errors, if necessary.

    If the deployment fails, run the orchestrator-cli deploy command again.

    If deployment still fails, check the latest chef client log for the error(s) and debug as necessary.

    Run ls -ltr to sequence the log files. The last file shown is the most recent one to check.

    Run less chef-client.log to scroll through the log, or use cat with / and a search term; use n to go to the next occurrence of the term. See also troubleshooting deployments.

  14. Reboot the cluster.

    Because there is a new kernel, full cluster reboot is required.

    orchestrator-cli reboot -c cluster.conf
    

Complete Post Installation Tasks

To verify your deployment, complete the following post-installation tasks.

  1. Update DNS records.

    DNS records are required for the HTTP routers (via the ELB) and monitoring host using the external (public) IP address for this hosts. You can use nslookup to verify the DNS entries you make.

    DNS Entry Description
    base_domain CNAME record with the address of the ELB (get this value using terraform output elb-address).
    *.base_domain CNAME record to base_domain (alias or pointer, such as *.cluster.example.com)
    monitoring.cluster-name.domain.tld A record pointing to the public IP address of the monitoring host (get value using terraform output monitoring-public-address). Note that this value cannot be under base_domain entry and should match what you entered in the apzabbix.webhostnames section of the cluster.conf.erb file. For example: monitoring.cluster.example.com.
    tcp-router Public IP address of the TCP router (get value using terraform output tcp-router-public-address). This entry is optional.

    For example, here is how DNS entries would exist for an example AWS cluster (using AWS Route 53):

    screenshot

  2. Verify and bootstrap the deployment.

    Log in to the web console and download and install APC.

    Target the cluster and log in using APC.

  3. Install Apcera packages.

    Install Apcera packages that you want for your cluster.