Configuring Splunk Search

This section describes how to integrate Splunk logging with your Apcera cluster.

Use Cases

Apcera integrates with Splunk to index and search component logs which is useful for cluster troubleshooting, and for draining job logs. This documentation describes how to configure Splunk for such purposes.

While the typical Splunk integration deploys a splunk-indexer within the cluster, Apcera also supports integrating with an existing external splunk-indexer.

See also the Job Log Drain Using Splunk tutorial.

Components

Apcera provides two configurable Splunk components. Each component is installed on its own virtual machine.

Component Required Description
splunk-indexer Required cluster component if you want to use Splunk with your Apcera cluster deployment. Receives Apcera component log data from all server nodes in the cluster (via Splunk Forwarder agents). Includes a syslog server that can be used as a log drain target for jobs in the cluster.
splunk-search Optional component within the cluster, but required somewhere in your enterprise. Provides Splunk search user interface and performs searches. If you have an existing Splunk search head installed in your enterprise, you can simply install the splunk-indexer and point it there.

When you configure your cluster to use Splunk, Apcera installs the Splunk Forwarder agent on each cluster host. The Splunk Forwarder agent is configured to monitor the contents of Apcera component log files, and talks to the Splunk Indexer using TCP connections over port 8089. Each time a log entry is made by a component and the contents of a log file changes, the agent transfers the data to the Splunk Indexer.

screenshot

Requirements

To configure your cluster for Splunk indexing and search, you will need the following information to populate cluster.conf:

  • Splunk license
  • Public IP address of the Splunk license server
  • Public and private SSL keys to communicate with the Splunk license server

You will also need to decide if you need install a Splunk search or will be using an existing one.

If you are using Terraform to deploy your cluster, use the Apcera-provided splunk-indexer.tf Terraform module to deploy Splunk components. Note the following defaults used for the Splunk machine hosts:

  • splunk-indexer gets 500GB data volume on its own VM.
  • splunk-search gets 100GB data volum on its own VM.

Scaling

This section describes how to scale Splunk searching for your cluster.

Logging rate

Your Splunk License governs a daily limit to the volume of logs which you collect in Splunk.

Increasing the number of splunk-indexer servers provides performance benefits during both log collection and during searching by dividing up the disk I/O load among the multiple search-indexer servers.

Search rate

You can increase the number of splunk-search servers to increase the speed of searches when multiple users are performing complex searches.

Log retention

The total amount of Splunk data that can be indexed and stored is governed by the size of the splunk-indexer volume.

This can be addressed by either increasing the size of the splunk-indexer disk1-size, or by increasing the number of splunk-indexer servers.

Note that often recommend increasing the number of splunk-indexer servers since:

  • Changing the size of an existing volume will delete the logs already stored on that volume
  • Increasing the number of splunk-indexer servers also provides performance benefits during both log collection and during searching.

Splunk configuration examples

This section provides Splunk configuration examples.

splunk-indexer configuration

The following example deploys the splunk-indexer and points to a "master" which is the public IP of an external Splunk search head. You will use a pre-existing Splunk search head and license server.

machines: {
  ...
  splunk-indexer: {
    hosts: [ '10.0.2.7' ]
    suitable_tags: [ "splunk-indexer" ]
  }
}

components: {
      splunk-indexer: 1
}

chef: {
  "continuum": {
    ...
    "mounts": {
      "splunk-indexer": {
        "device": "/dev/xvdp"
      }
    },
    "splunk": {
      "users": {
        "admin": { "password": "EXAMPLE_PASSWORD" }
      }
      # Public IP of an external splunk-search head
      "master": "203.0.113.24",
      "ssl": {
        "enable": true,
        "certs":
          {
            "server_names": [ "cluster-name.splunk-indexer.domain-name.tld" ],
            "certificate_chain": (-----BEGIN CERTIFICATE-----
              XXX-CERT-XXX
              -----END CERTIFICATE-----
              -----BEGIN CERTIFICATE-----
              XXX-CERT-XXX
              -----END CERTIFICATE-----
            )
            "private_key": (-----BEGIN RSA PRIVATE KEY-----
              XXX-KEY-XXX
              -----END RSA PRIVATE KEY-----
            )
        }   # splunk -> ssl -> certs
      }     # splunk -> ssl
    }       # splunk
  }         # continuum
}

splunk-indexer and splunk-search configuration

To configure both the splunk-indexer and the splunk-search components, see the following example.

provisioner {
  type: generic
}

machines: {
  ...
  splunk-indexer: {
    # splunk-indexer-address
    hosts: ['10.0.0.55']
    suitable_tags: [ "splunk-indexer" ]
  }
  splunk-search: {
    # splunk-search-address
    hosts: ['10.0.0.8']
    suitable_tags: [ "splunk-search" ]
  }
}

components: {
  # Splunk-specific Components
      splunk-indexer: 1
      splunk-search: 1
}

chef: {
  "continuum": {
    ...
    "mounts": {
      "splunk-indexer": {
        # splunk-indexer-device
        "device": "/dev/xvdp"
      }
      "splunk-search": {
        # splunk-search-device
        "device": "/dev/xvdq"
      }
    },          # mounts
    "splunk": {
      "users": {
        "admin": { "password": "PASSWORD" }
      }
      # Public IP of an external splunk-search head
      "master": "203.0.113.24",
      "ssl": {
        "enable": true,
        "certs":
          {
            "server_names": [ "cluster-name.splunk-search.domain-name.tld", "cluster-name.splunk-indexer.domain-name.tld" ],
            "certificate_chain": (-----BEGIN CERTIFICATE-----
              XXX-CERT-XXX
              -----END CERTIFICATE-----
              -----BEGIN CERTIFICATE-----
              XXX-CERT-XXX
              -----END CERTIFICATE-----
            )
            "private_key": (-----BEGIN RSA PRIVATE KEY-----
            XXX-KEY-XXX
            -----END RSA PRIVATE KEY-----
            )
        }       # splunk -> ssl -> certs
      }         # splunk -> ssl
    }           # splunk
  },            # continuum
}

External splunk-indexer

While the typical Splunk deployment is to deploy an indexer with the cluster as described above, Apcera supports integrating with an existing external splunk-indexer. For example:

chef: {
  “splunk”:
    { “splunk-indexers”: [“index-server-url”] }
  “continuum”: {
    “splunk”: {
      “users”: {
        “admin”:
          { “password”: “password” }
      }
    }# splunk
  }
}

Searching Splunk

To search Splunk for the cluster or job logs related to a specific job you can use the job UUID.

To get the job UUID, run the following APC command:

apc job list -l

Splunk Search Helpers

Having to manually lookup the job UUID every time is frustrating, and translating from a job UUID back to the job name is much more complicated. Instead you can use Splunk itself to do that translation based on other data available to it. All you need are some Splunk search helpers defined to make correlating the logs easier.

Starting with the 3.0 release of Apcera, Splunk servers in the cluster are now pre-populated with several configurations to make searching the Apcera log data easier. These configuration items are grouped together in a Splunk App named Apcera. This App is by default not marked as visible, and the search helper items in it are restricted to only be usable from within the App. You can enable the App or change the permissions on the helper items to make them usable from outside the App.

Once you have enabled them the splunk helpers allow for simpler searches to find the data you want. The helpers are Splunk macros which you activate by placing back-ticks around the macro invocation in your splunk search. The provided helpers are::

  • apcera_job_logs(jobname): Finds the logs for a job by name (wildcard supported). This assumes that a logdrain has been added to the job, sending the logs to Splunk.
    • Here is an example of using the apcera_job_logs(jobname) macro to search for the logs of a job by name.
      screenshot
  • apcera_job_logs: (Without any arguments) Finds the logs for all jobs, with the required logdrain.
  • apcera_access_log(jobname): Finds the HTTP access logs from the NGINX router for all requests to a job by name.
  • find_job(jobname): Finds all the internal logs related to a job by name. This includes logs from all Apcera components that reference the job.
  • job2fqn: When applied at the middle or end of a Splunk search command, it attempts to translate the job UUIDs listed in the search results into Job FQNs.
    • Here is an example of using job2fqn to translate a UUID into the job name, then using the Splunk builtin command top to report on the top job names matching the initial search query:
      job=* fail* | `job2fqn` | top job_fqn

Splunk Search Helper Manual Setup

If you are using an external splunk indexer or search head, these configuration items will need to be created there. Consult with your Splunk administrator to determine the best way to configure your Splunk searches. The Apcera search helpers you will want to create are Splunk macros, and field extractions.

The helper macros to be created are listed below. Note: Several of these depend upon other macros in the list, so you must load or enable all the required macros.

Name: apcera_job_logs(1)
Arguments: fqn
Definition: index=apcera-job-log | eval job=source | rex field=job mode=sed "s/^.*\/([^\/]*)/'\1'/" | join job [`job_fqn_mapping` | search JobFQN="job::$fqn$" | fields job

Name: apcera_job_logs
Definition: index=apcera-job-log | eval job=source | rex field=job mode=sed "s/^.*\/([^\/]*)/'\1'/" | `job2fqn`

Name: apcera_access_log(1)
Definition: index=continuum-router source="/var/log/continuum-router-access-logs/*" "job::$jobfqn$"
Arguments: jobfqn 

Name: find_job(1)
Definition: index=* [`job_fqn_mapping` | search JobFQN="job::$fqn$" | fields job]
Arguments: fqn 

Name: job2fqn
Definition: join job [ `job_fqn_mapping` | fields job,job_fqn ]

Name: job_fqn_mapping
Definition: search index=* sourcetype="continuum-cluster-monitor" stats message | rex field=_raw "setting to (?<stats_message>.*)" | spath input=stats_message output=instances Instances{} | mvexpand instances | spath input=instances | dedup JobUUID | rex field=JobFQN "job::(?<job_fqn>.*)" | eval job=split(JobUUID.",'".JobUUID."'",",") | mvexpand job

The recommended field extractions to add are:

Name: apcera-nginx-access-extractions
Apply to Sourcetype Named: continuum-router
Type: Inline
Extraction: ^[^ ]+\s+((?<apcera_txid>[^ ]+)\s+)?(?<src_ip>\d+\.\d+\.\d+\.\d+)\s+(?<http_response_code>[\d]+)\s+(?<response_time>[\d\.]+)\s+((?<request_length>[\d\.]+)\s+(?<response_length>[\d\.]+)\s+)?(?<http_request_type>[\w]+)\s+(?<http_uri>[^ ]+)\s+--\s+(?<upstream_response_code>\d+)\s+(?<upstream_server>\d+\.\d+\.\d+\.\d+:\d+)\s+(?<upstream_response_time>[\d\.]+)\s+--\s+"(?<apcera_route>[^"]+)"\s+"(?<job>[^"]+)"\s+"(none|job::(?<job_fqn>[^"]+))"\s+"(?<instance>[^"]+)"\s+(?<sticky_target>\d+)(\s+(?<connection>\d+)\s+(?<connection_requests>\d+))?

Name: continuum_source
Apply to Host Named: *
Type: Inline
Extraction: source='(?<continuum_source>[^']*)'