Configuring Package Manager

This section describes how to configure the Package Manager component.

About the Package Manager component

The Package Manager (PM) component manages storage and retrieval of Apcera Platform packages. A job requires at least one package. Instance Managers (IM) are responsible for acquiring a job's packages when an instance of a job is scheduled on a given IM.

See Apcera Packages and PM Tagging for details on packages and how the PM interacts with Instance Managers (IMs).

Cluster Administrators must select a "storage backend" for platform packages. All PMs in the cluster must share the same storage backend configuration
this requirement is enforced via Orchestrator. Apcera provides a number of storage backend options to suite various deployment scenarios. Backend storage
for PMs is controlled via cluster.conf. There are two classes of Package Storage Backends Singleton and HA.

In all cases, PMs should have a dedicated storage volume configured. Singletons store packages in this volume. HA backends use this volume for their LRU cache.

Singleton Package Storage Backend (local)

local is only supported for clusters with a single PM, such as Apcera Community Edition or Minimum Viable Deployment. As such, it is not suitable for production deployments or clusters which may start with one PM and scale up later. Packages are stored in a directory on the single PM. In other words, a cluster using local package storage can only have a single Package Manager (PM).

There is no conversion process from local to HA (gluster or s3) package storage. Clusters intended to scale from a single PM to many PMs should use an HA backend.

HA Package Storage Backends (s3 and gluster)

HA storage backends store packages in a shared storage solutions suitable for production clusters with many PMs. Each PM maintains a local LRU cache of packages.

  • s3 storage backend stores packages in an Amazon S3 compatible service. Supported S3 providers are AWS S3 and Apcera provided Riak-CS storage. AWS S3 is recommended for clusters residing in AWS.
  • gluster storage backend stores packages in an Apcera provided gluster storage cluster.

LRU Cache Configuration Parameters

The Package Manager LRU acts in conjunction with the s3 and gluster storage backends. This feature prunes package files from each PM's local cache based on which packages are least recently used (LRU).

The LRU acts only on the PM's local package cache, polling based on the prune_interval_seconds and pruning the cache if larger than reserved_size_bytes. reserved_size_bytes should be set to the target size of the LRU cache. This size should not exceed 50% of the free space available for packages.

See Volume mounts for details on configuring a dedicated storage location for the LRU via package-storage.

Configuration options for chef.continuum.package_manager.lru section of the cluster.conf. LRU parameters:

  • enabled – enables LRU. Default is enabled.
  • reserved_size_bytes – target size of the LRU cache. Default is 10 GB, sized for 20 GB package-storage volume.
  • prune_interval_seconds – how often the LRU performs pruning. Default is 30 minutes.
chef: {
  "continuum": {
    "package_manager": {
      ...
      "lru": {
        "enabled": "true",
        # 50% of 40 GB volume
        "reserved_size_bytes": 21474836480
      }
      ...
    }
  }

Staging Coordinator Configuration Parameters

If necessary you can change the default RAM and disk size allocated to the Staging Coordinator to launch apps from source code or from capsules. The defaults are 256MB memory and 2GB disk. If you are deploying large legacy apps, you may need to increase both of these settings.

chef: {
  "continuum": {
    "package_manager": {
      "staging_coordinator": {
        "memory": 4294967296,
        "disk": 4294967296
      }
    }
  }
}

This example supports working with large packages.

Database Connection Lifetime Configuration

The chef.continuum.package_manager.db.conn_max_life_time parameter lets you specify the maximum lifetime (in seconds) of idle database connections before they are cleaned up. A value of 0 means database connections live for ever, which is the default behavior. For example, the following sets the maximum lifetime for idle database connections to 300 seconds:

chef: {
  "continuum": {
    "package_manager": {
      "db": {
        "conn_max_life_time":300
      }
    }
  }
}

Local Package Storage Backend Configuration

In local mode, the PM configuration setting cleanup_on_delete controls the removal of packages when it is deleted. If this setting is set to false, the local PM cache is not pruned when users delete package records. The JSON metadata record is removed, but the package tarball persists on the PM disk. If this setting is set to true, both the metadata and tarball is removed when a user deletes a package.

chef: {
  "continuum": {
    "package_manager": {
      "package_store_type": "local",
      "local_store": {
        "cleanup_on_delete": true
      }
    },

See Volume mounts for details on configuring a dedicated storage location for package-storage.

AWS S3 Package Storage Backend Configuration

Configure AWS S3 as the Package Storage Backend.

chef: {
  "continuum": {
...
    "package_manager": {
      "package_store_type": "s3",
      "s3_store": {
        "access_key": "AKIAIOSFODNN7EXAMPLE",
        "secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "endpoint": "s3-us-west-2.amazonaws.com",
        "bucket": "cluster-stack-s3bucket-3f935ertytd"
      }
      ...
    },
...

Terraform provisions this bucket in AWS. See Configure AWS

Gluster Package Storage Backend Configuration (BETA)

Starting with Apcera release 2.6.0, Gluster is an alternative package storage backend. Support for Gluster in this release is BETA for technical preview. It is scheduled to supersede Riak-CS in a future release.

chef: {
  "continuum": {
...
    "package_manager": {
      "package_store_type": "gluster",
      ...
    },
...

See Deploying Gluster for configuration steps.

Terraform provisions and configures Gluster servers. See Installing Apcera Platform Enterprise Edition

Migrating from Riak-CS to Gluster

Existing clusters deployed with Riak-CS as a package storage backend may migrate to Gluster, by completing the following procedure.

  1. Update your Terraform module to provision 3 Gluster "cluster storage" node for packages machines.

  2. Make the following changes to cluster.conf.erb and generate cluster.conf:

     machines: {
       cluster_storage: {
         hosts: [ {{ .ClusterStorageIPs }} ],  # < three IPs here
         suitable_tags: [
           "cluster-object-storage"
         ]
       }
     }
    
     components: {
        cluster-object-storage: 3
     }
    
     chef: {
       "continuum": {
         "package_manager": {
           "package_store_type": "gluster"
           "migration_target": "gluster"
           "migration_source": "s3"
         },
       }
     }
    
  3. Perform a dry run and deploy the cluster.

    All packages in Riak will be automatically migrated to Gluster. This make take a significant amount of time depending on how many packages you have in your cluster.

  4. After some time, restart q Package Manager.

    In the log file for the PM, check for the message MigrateAllResources complete: total: %d successful: %d failed: %s message, indicating that the package migration is complete. Note that this message will only appear on restart of the PM and only after the migration is complete. If you restart the PM before the migration is complete you won't see this message.

  5. Once the migration is complete, remove both migration_* parameters and run a deploy.

     chef: {
       "continuum": {
         "package_manager": {
           "package_store_type": "gluster"
         },
       }
     }
    

    This removes the s3 (Riak) configuration from all of the PMs. Leave "package_store_type": "gluster" as is.

  6. Remove all Riak configuration settings from from cluster.conf amd run a deploy.

  7. Run terraform destroy -target=riakNode1 -target=riakNode2 -target=riakNodeN and delete all resources provisioned for Riak.

    Replace "riakNodeX" with the hostname of each Riak node.

Riak-CS Package Storage Backend Configuration

For non-AWS deployments, Apcera provides Riak-CS for remote package storage. Riak-CS is an open source software implementation of the S3 protocol. Since the default package store type in cluster.conf is s3, no Package Manger specific configuration is required. Because Riak-CS is an S3-compatible object store, creating the Riak-CS hosts automatically adds them to the Package Manager and configures s3.

However, there is an important deployment distinction when you are using Riak. If you are using Riak, you must set up DNS before you deploy the cluster. As shown below in the diagram, connections to the Riak servers are made by connecting to TCP port 6104 on s3.$base_domain via the Apcera HTTP router. Thus, cluster nodes hosting Riak must be able to resolve packages.s3.$base_domain to one or more Apcera routers via DNS during the cluster deployment.

Therefore, while the installation of a cluster that uses Riak is unique depending on the type of provider, the general workflow for a Riak deployment is as follows:

1) Create the infrastructure (typically using Terraform)
2) Configure DNS (using AWS Route 53, Azure DNS, custom DNS, other DNS, etc.)
3) Deploy the cluster (using Orchestrator and cluster.conf)

For Riak you will need to create two DNS records: {DOMAIN} and *.{DOMAIN}. These should be registered with DNS and pointing to the IP addresses of the HTTP routers, or the load balancer that fronts the routers. Please refer to the Configuring DNS documentation for details on how to create DNS records.

screenshot

Riak will be deprecated in an upcoming release. As such, new non-AWS clusters should use gluster in favor of Riak.

Legacy Provisioner examples

The following example cluster.conf snippet shows how to configure the Riak object store machine host for a specific provider such as the vSphere or OpenStack provider. (This example is for vSphere.)

machines: {
  object_storage: {
    cpu: 2
    memory: 16384
    disk: [
      { size: 100, purpose: riak }
    ]
    suitable_tags: [
      "riak-node"
    ]
  }
}

components: {
           riak-node: 5
}

Generic Provisioner

The following cluster.conf snippet shows how to configure Riak for a generic (IP address) provider.

machines: {
  object_storage: {
    hosts: [ "10.224.214.135", "10.224.214.132", "10.224.214.133", ... ]
    suitable_tags: [
      "riak-node"
    ]
  }
}

components: {
           riak-node: 5
}

Riak-CS Garbage Collection Settings

By default, when packages are removed from the Apcera Platform using Riak as the storage backend, packages are not instantly cleaned up. There are a few thresholds that need to be met before Riak's internal Garbage Collection process kicks in.

These values are configurable starting with release 508.

Default values if not explicitly configured:

chef: {
      "riak": {
        "config": {
            "bitcask": {
                "dead_bytes_merge_trigger": 67108864,
                "dead_bytes_threshold": 16777216,
                }
        }
      },

      "riak_cs": {
          "config": {
              "riak_cs": {
                  "leeway_seconds": 60,
                  "gc_interval": 300,
                  "gc_retry_interval": 4500,
              }
          }
      }


  "continuum": {
...
}

The example below shows 'aggressive' settings for redeeming free space, and may not be appropriate in all situations, as the garbage collection processes can be very taxing on a Riak/RiakCS cluster.

chef: {
      "riak": {
        "config": {
            "bitcask": {
                "dead_bytes_merge_trigger": 0,
                "dead_bytes_threshold": 0,
                "log_needs_merge": true,
                }
        }
      },

      "riak_cs": {
          "config": {
              "riak_cs": {
                  "leeway_seconds": 300,
                  "gc_interval": 900,
                  "gc_retry_interval": 4500,
              }
          }
      }


  "continuum": {
...
}
  • leeway_seconds: this is the time between when a package is deleted, and the database actually marks it as deleted. This is to ensure that if someone or something else was downloading that particular package, they will have 5 minutes to retrieve the file.
  • gc_interval: this runs the garbage collection process every 120 seconds to clean up items that have been marked as deleted.
  • gc_retry_interval: this is run ever 900 seconds and attempts to remove any packages that failed (for whatever reason) during the initial garbage collection process.
  • dead_bytes_merge_trigger and dead_bytes_merge_threshold are threshold triggers in the storage backend. A value of 0 means that whenever a file is marked as deleted by garbage collection, it will be physically removed from the disk. When these thresholds are met, a 'merge' will be run which will re-converge the data in the Riak ring and free up space.