Apcera Architecture and Components
Apcera is a distributed system comprising several components that communicate using messaging. Each component process runs a policy engine to interpret and enforce policy and an agent for monitoring, while also performing a specific function within the system.
Apcera components are deployed to distinct operational roles or planes:
- Control Plane provides communication and distributes policy for the system.
- Routing Plane load balances traffic into the system.
- Monitoring Plane monitors system components.
- Management Plane commands and controls system components.
- Runtime Plane provides the execution environment for Linux container instances.
This diagram is an illustration of a generic Apcera deployment. Each production deployment will be unique. Refer to the sizing guidelines for specific details.
An Apcera component is a process on a node in a cluster that communicates with other components using messaging and is governed by the policy engine. Components act as managers to perform their functions and are not managed by other components.
The Control Plane provides communication and distributes policy for the system.
Apcera components communicate using the NATS messaging system. Components communicate in peer-to-peer fashion through the exchange of messages. The NATS messaging server resides in the Management Plane, and each component is a client.
Policy is a machine readable representation of the intended behavior of the cluster. Virtually all surface area of Apcera is governed by policy.
The policy language is a declarative scripting language. A policy comprises a realm and one or more rules. The realm defines the set of resource(s) to which a policy rule applies. A rule is a statement evaluated by the policy engine.
The Routing Plane load balances traffic into the system.
The HTTP Server receives inbound traffic from clients and routes them to the appropriate component or job using HTTP. Apcera implements a standard web server and adds several additional features to support the needs of the platform.
The TCP Server routes TCP traffic within the system for jobs that have enabled TCP routes.
The Monitoring Plane monitors system components. Apcera integrates with a third-party monitoring server and embeds a monitoring agent in each component.
The Management Plane commands and controls system components.
The Auth Server handles authentication for the platform. Apcera uses a token-based approach to authenticate users. The Auth Server integrates with various identity providers, including OAuth and LDAP. The Auth Server does not store credentials.
The API Server receives requests from clients (apc, web console, REST API) and converts them into requests within the cluster over NATS communicating with the relevant components.
A package is binary data, typically encapsulated in a tarball, and JSON metadata.
The Package Manager (PM) is responsible for managing assets associated with a package, including references to the location of assets within the packages. The PM connects to the persistent datastore filesystem to store and retrieve packages. Once retrieved, packages are cached locally on runtime hosts.
A job is the base unit in the system. A job is a workload definition that represents an inventory of packages, metadata, and associated policies. There are several types of jobs, including: Apps, capsules, Docker images, stagers, service gateways, and semantic pipelines.
The JM maintains the desired state of jobs within a cluster, and the set of packages that constitute a job. The JM stores job metadata and state within the database.
The Health Manager (HM) communicates with IMs to determine the number of live instances of each job and communicates this to the JM that can initiate start or stop requests to the runtime plane based on deltas from the intended state. The HM subscribes to these heartbeats and takes actions based on the intended/actual state disparities.
The Metrics Manager (MM) component tracks the utilization of resources (disk, cpu, ram, network) over time. The related Cluster Monitor component provides realtime cluster statistics to the web console.
The Runtime Plane comprises a single component: the Instance Manager (IM).
An instance is an instance of a Linux container. Each instance runs a job workload in isolation on an Instance Manager node. The IM is responsible for managing the lifecycle of container instances. Each instance is represented by a container that runs on an IM host. The IM performs CRUD plus monitoring operations on container instances.
In Apcera, a container instance is a running job within a single isolated execution context. Each container instance is instantiated with a unioned filesystem to layer only the dependencies required by the running processes in the job definition. Apcera container instances implement Linux cgroups and kernel namespacing to isolate the execution context of a running job instance.
Additional responsibilities of the IM include:
|Scheduling||IM receives container instance placement requests from the JM and calculates the taint value based on a number of scheduling criteria (tags, resource usage, locality). The taint value is used as response delay: after that delay IM responds to the placement request. It is up to the requester to follow up with the container instance creation request.|
|Health||The IM periodically sends out heartbeats for each instance. The Health Manager (HM) subscribes to these heartbeats and takes actions based on the intended/actual state disparities.|
|Metrics||The IM periodically sends out instance metrics (resource consumption, uptime etc.). The Metrics Manager (MM) subscribes to metrics and persists them in metrics storage.|
|Routes||Once instance is running, IM sends out information about ports/routes exposed on the instance. This is used by routers for route mapping and by other IMs to set up the destination IP/port for dynamic bindings.|
|Log Delivery||The IM monitors instances logs for any new activity and delivers new logs to a number of configured log sinks.|
|Semantic Pipeline Setup||When the IM starts an instance that’s supposed to have a semantic pipeline, it suspends the instance start process until semantic pipeline is set up completely. It also takes care of setting up network rules so that job-to-service traffic flows through semantic pipeline.|
|Runtime Configuration||The IM fills out any templates defined in the instance job/package using templated values, including the process name, UUID, and network connections, including links (ip:port) and bindings(ip:port, credentials, scheme).|
|Network Connectivity||IM is responsible for setting up network configuration for container instances. This involves setting up and maintaining firewall rules (iptables), and optionally connecting to other components, for example to set up GRE tunnels for fixed IP routing or to provide persistence for the container.|
|Filesystem||IM downloads packages referenced by instance job and lays them out in the container filesystem.|
|Package cache||IM reserves approximately 1/2 of its allocated disk space for caching packages locally and storing the job logs.|
Creating starting a job instance from source using an Apcera client involves the following component interactions.
- APC sends a request to the API server, packages the source code in a tarball and uploads it to the cluster.
- API server invokes the staging coordinator job to ingest the source code along with any dependent packages through package resolution.
- Detected or specified staging pipeline is invoked and stager(s) compile or build the source code and push it to the PM which stores the assets and metadata as a “Package.”
- API sends the expected initial state of the application to the JM, including the package and package dependency information.
- JM broadcasts a NATS message to all IMs requesting bids for an IM to start one or more job instances (containers).
- Each IM responds according to its taint with an offer to start an instance of the job. Each IM's taint is determined by a set of scoring rules based on the set of nodes (containers) under its control, tags asserted on the job definition, and other criteria.
- JM receives the bids and sends a request to the “best” scoring IM to start an instance of the app.
- IM retrieves the relevant packages from its local cache, or from the PM if it is an initial deploy, and starts the appropriate number of instances. Each IM will start no more than one instance per node (container).
Job instance monitoring
- Job instance log output (stdout & stderror) is centrally available and routed through TCP log drains, including ring buffer, syslog, and NATS.
- HM monitors number of running instances of each given job based on the job definition.
- HM communicates the number of running instances to the JM.
- MM collects the resource utilization of nodes within the cluster and can be queried via the API server to display resource and performance information through APC and the web console.
- Cluster monitor collects and provides real-time instance data to the web console.
- Audit log provides full record of all cluster activities stored on dedicated DB in HA mode.
Job scaling and resiliency
- If the number of running instances of a given job reported by the HM is lower than the defined state of the job, the JM will send out a request to the IMs for bids to start additional instances, similar to the initial app create/deploy and start process.
- If the number of running instances of a given job reported by the HM is higher than the defined state of the job, the JM will send out a request to the IMs for reverse bids to shut down extraneous instances.
- The JM will accept the “best” bids and send RPC calls to the relevant IMs to shut down the relevant job instances.
For more information, refer to the Apcera Architecture Whitepaper.