Deployment Sizing Guidelines

This document provides minimum capacity sizing guidelines for running Apcera Platform Enterprise Edition in production.

Required cluster components

The following components are required to install the Apcera Platform Enterprise Edition on any supported provider or OS. See also minimum viable deployment.

Component Description Count HA Considerations
api-server Provides HTTP API endpoints for the cluster. 1 or more Scale horizontally by running multiple to handle a large number of concurrent client connections coming from APC, the web console, or custom clients.
auditlog-db Stores audit logs in PostgreSQL DB. 1 or more Scale horizontally by running multiple. See audit log HA.
auth-server Cluster Security Server for policy and key storage and distribution. 1 or more Scale horizontally by running multiple.
cluster-monitor Reports real-time cluster statistics. 1 or more Scale horizontally by running multiple.
component-db Stores cluster artifacts in PostgreSQL DB. 1 or more Scale horizontally by running multiple.
events-server Streams life-cycle and resource usage events for a cluster resource (a job, package, or route, for example) to subscribed clients. It also manages client event subscriptions and garbage-collects subscriptions for disconnected clients. 1 or more Scale horizontally by running multiple.
flex-auth-server Central authority for authentication: basic-auth-server, google-auth-server, ldap-auth-server, keycloak-auth-server, app-auth-server (for App Token) 1 or more Scale horizontally by running multiple Flex Auth components.
graphite-server Storage for cluster metrics. 1 exactly Singleton. See Graphite storage.
health-manager Calculates and reports job health. 1 or more Scale horizontally by running multiple. See Health Manager HA.
instance-manager Runtime environment for job instances. 1 or more Scale horizontally or vertically to run more job instances. In production, run 2 or more IMs.
job-manager Manages jobs deployed to the cluster. 1 or more Scale horizontally by running multiple.
metrics-manager Handles statsd traffic and reports cluster statistics over time. 1 or more Scale horizontally by running multiple.
nats-server Message bus for component communications. 1 or more Scale horizontally by running multiple (clustered NATS).
orchestrator-database orchestrator-server Use to install cluster software, manage cluster deployments, collect component logs, etc. Includes PostgreSQL DB. 1 exactly Run on VM host, version control cluster.conf, back up DB regularly. See the Orchestrator documentation.
package-manager Manages distribution of platform packages. 1 or more May run as a singleton in local mode. For HA, run multiple PMs in s3 or gluster. See configuring package manager.
redis-server Log buffer for storing job logs. 1 exactly Singleton.
router HTTP router (NGINX) responsible for routing and load balancing inbound traffic. 1 or more Scale horizontally by running multiple to handle high volume of inbound requests or if your network requires it. If multiple then fronted by separate load balancer such as ELB.
stagehand Responsible for creating and updating system-provided jobs and resources. 1 exactly Not a runtime component.

Optional cluster components

The following components are optional for deploying the Apcera Platform Enterprise Edition in production. Although technically optional, several of these components are required for a minimum production deployment.

Component Description Count HA Considerations
gluster-server Provides HA NFS persistence. 0 or 3 x N Recommended for production clusters requiring persistence. The count is a multiple of 3 for replication (3 x N). See HA NFS persistence.
ip-manager Provides static IP addressing for integrating with legacy systems that require fixed IP. 0 or 1 Optional singelton.
monitoring Component monitoring Zabbix server and database (PostgreSQL DB). 0 or 1 Typically both components are installed on the same host. For HA, use an RDS or install DB in HA mode on separate host. Use an external monitoring system to monitor the server.
nfs-server Provides NFS persistence layer. 0 or 1 Optional singleton. For HA, use gluster-server x 3.
riak-node Distributed S3-compliant package store. 0, 3 or 5 Use in production for HA package-manager storage when there is no other S3-compatible blob storage. The minimum number of Riak hosts is 3; the recommended number of Riak hosts is 5. Riak is required on for on-premises, non-AWS cluster deployments where HA package management is required.
cluster-object-storage Gluster Package Storage Backend. 0 or 3 x N Use in production for HA package-manager storage when there is no other S3-compatible storage.
splunk-search Lets you to search across Splunk-collected component and job logs. 0 or 1 Optional singleton.
splunk-indexer Lets you to index component and job logs for Splunk searches. 0 or 1 Optional singleton.
tcp-router Handles TCP traffic into the cluster (NGNIX). 0 or 1 Can have multiple, but does not give you redundancy to do so.

Minimum viable deployment

For a minimum viable deployment (MVD) of the Apcera Platform Enterprise Edition on any supported platform, you need 4 physical or virtual machines.

MVD deploys only the required cluster components. MVD provides no redundancy, no component monitoring, and is not production grade. MVD is an bare minimum EE installation that serves as a baseline reference point.

screenshot

MVD Resource requirements

The minimum machine resources required for MVD on any supported platforms are as follows:

Count Machine Role RAM Disk Components
1 orchestrator 2GB 8GB orchestrator-server, orchestrator-database
1 central 4GB 20GB router, api-server, flex-auth-server, nats-server, job-manager, health-manager, cluster-monitor, metrics-manager, auditlog-database, component-database, events-server, auth-server
1 singleton 4GB 20GB package-manager, redis-server, graphite-server, stagehand
1 instance-manager 8GB 100GB instance-manager

MVD Considerations

  • The auth-server is responsible for policy and security. One or more Flex Auth Server components is required for authentication.
  • Singleton package-manager clusters may use any package manager configuration. Clusters intending to deploy more than one package-manager in the future should deploy a HA Package Storage Backend.
  • Commonly scaled components are deployed to the central host anticipating future growth of the cluster.
  • The central host generally has low resource requirements. However, because this host is running several processes, you must allocate enough CPUs depending on the workloads you are running to ensure that the disk is not under contention and the host is able to handle fluctuating CPU demands. Note that the HTTP router may require high network throughput.
  • The runtime hosts (IMs) require the most resources. Additional CPU allows for more parallelism to handle CPU spikes, such as starting many jobs at the same time. The disk size ensures that as the cluster evolves and has more packages, disks do not come under contention.
  • Each IM reserves approximately 50% of its partitioned disk space for package caching, instance logs, and job metadata. The rest is used to run container workloads (job instances), and is the amount reported when viewing cluster resources using apc or the web console. This is accounted for in these recommendations.
  • The graphite-server component (and associated statsd-server) cannot run on the central host due to a port 80 conflict with the HTTP router. In production these components are deployed to a dedicated host.

Minimum production deployment

For a minimum production deployment (MPD) of the Apcera Platform Enterprise Edition on any supported platform, you need 8 physical or virtual machines.

MPD adds a redundant central host, a monitoring host, a dedicated host for logging and metrics, 2 IM hosts, and optional components typically required for production workloads.

screenshot

MPD Considerations

  • Monitoring components are required in production. Omit any other optional component you do not need (tcp-router, nfs-server, ip-manager).
  • The auth-server is the Security Server component and is made redundant by running multiple on the central hosts.
  • Each flex-auth-server component is made redundant by running multiple on the central hosts.
  • Deploying more than one package-manager requires a HA Package Storage Backend. See package manager configuration.
  • The nfs-server is a singleton. For HA NFS persistence, install 3xN gluster-server hosts for replication of NFS data.

MPD on AWS

The following table lists the minimum resource requirements for installing Apcera EE on AWS.

Count Machine Role Instance Type Components
1 orchestrator t2.small orchestrator-server, orchestrator-database
2 central m3.medium router, api-server, flex-auth-server, nats-server, job-manager, package-manager, health-manager, cluster-monitor, metrics-manager, component-database, auditlog-database, events-server, auth-server
1 singleton m3.medium tcp-router, nfs-server, ip-manager, stagehand
1 logs-metrics c4.large redis-server, graphite-server
1 monitoring m3.medium zabbix-server, zabbix-database
2 instance-manager r3.large instance-manager

AWS MPD Considerations

  • To potentially reduce costs, you could split the monitoring host by deploying the zabbix-database to a specialized RDS (Relational Database Service) host using db.t2.small and the zabbix-server to a t2.small EC2 host.
  • The r3.xlarge EC2 instance type provides good ECU (Elastic Compute Units) allocation, a healthy amount of RAM, and sufficient disk storage. Although the m2.2xlarge type is considered legacy, and is more expensive than the r3, m2 may be used if you prefer not to use an SSD disk for the IMs.
  • The package-manager component runs on each central. The HA storage backend is an AWS S3 bucket.
  • This deployment assumes that you will use an ELB in front of the multiple HTTP routers.

MPD on OpenStack

The following table lists the minimum resource requirements for installing Apcera EE on OpenStack.

Count Machine Role CPU RAM Disk Components
1 orchestrator 1 2GB 8GB orchestrator-server, orchestrator-database
2 central 2 4GB 20GB router, api-server, flex-auth-server, nats-server, job-manager, package-manager, health-manager, cluster-monitor, metrics-manager, auditlog-database, component-database, events-server, auth-server
1 singleton 2 4GB 20GB tcp-router, nfs-server, ip-manager, stagehand
1 logs-metrics 2 4GB 50GB redis-server, graphite-server
1 monitoring 2 4GB 20GB zabbix-server, zabbix-database
2 instance-manager 4 8GB 100GB instance-manager

OpenStack MPD Considerations

MPD on Ubuntu

The following table lists the minimum resource requirements for installing Apcera EE on Ubuntu Server 14.04.

Count Machine Role CPU RAM Disk Components
1 orchestrator 1 2GB 8GB orchestrator-server, orchestrator-database
2 central 2 4GB 20GB router, api-server, flex-auth-server, nats-server, job-manager, package-manager, health-manager, metrics-manager, cluster-monitor, auditlog-database, component-database, events-server, auth-server
1 singleton 2 4GB 20GB tcp-router, nfs-server, ip-manager, stagehand
1 logs-metrics 2 4GB 50GB redis-server, graphite-server
1 monitoring 2 4GB 20GB zabbix-server, zabbix-database
2 instance-mangaer 4 8GB 100GB instance-manager

Ubuntu MPD Considerations

  • Each IM must be parititioned, reserving approximately 20GB for the host OS, and the remainder for the IM volume. This is taken into account in the above guidelines. Note that this partitioning requirement is specific to bare OS installations. All other platforms use images we provide.
  • Deploying more than one package-manager requires a HA Package Storage Backend. See package manager configuration.

MPD on vSphere

The following table lists the minimum resource requirements for installing Apcera EE on vSphere.

Count Machine Role CPU RAM Disk Components
1 orchestrator 1 2GB 8GB orchestrator-server, orchestrator-database
2 central 2 4GB 20GB router, api-server, flex-auth-server, nats-server, job-manager, package-manager, health-manager, cluster-monitor, metrics-manager, auditlog-database, component-database, events-server, auth-server
1 singleton 2 4GB 20GB tcp-router, nfs-server, ip-manager, stagehand
1 logs-metrics 2 4GB 50GB redis-server, graphite-server
1 monitoring 2 4GB 20GB zabbix-server, zabbix-database
2 instance-manager 4 8GB 100GB instance-manager

vSphere MPD Considerations

We provide only the minimum deployment requirements for going into production with Apcera EE. In practice your production deployment will based on your unqiue capacity planning estimates. You may work with Apcera technical staff to plan your production installation.

In general, a recommended production deployment (RPD) has the following characteristics:

  • Based on the MPD resource requirements for your chosen platform.
  • Uses HA for all possible components, including package manager and NFS Services.
  • Scales the central host to 3 or 4 nodes.
  • Deploys 3 or more Instance Manager hosts.
  • Typically has dedicated machines for the routers. (The HTTP router may require high network throughput; the TCP router may require its own IP address.)

Capacity planning

Each cluster machine host has finite capacity. To determine how much capicity you will need in production, you need to factor in the capicity of each machine host, anticipated utilization, and desired level of fault tolerance.

General capacity planning questions:

1) What is your expected utilization?

2) What is your desired fault tolerance?

For example, with a 5 IM-node cluster, you may be able to tolerate 1 machine failure if you're under 80% utilization, whereas you may be able to tolerate 2 machines failing if you're below 60% utilization.