Monitor Armory Enterprise with Prometheus

Monitor Armory Enterprise using Prometheus and Grafana.

Overview

Armory recommends monitoring the health of Armory Enterprise in every production instance. This document describes how to set up a basic Prometheus and Grafana stack as well as enable monitoring for the Armory Enterprise services.

Additional Prometheus and Grafana configuration is necessary to make them production-grade, and this configuration is not a part of this document. Also note that monitoring the Pipelines as Code service (Dinghy) and the Terraform Integration service (Terraformer) are not discussed on this page.

Important

Armory 2.20 (OSS 1.20.x) introduced changes to metric names and the Monitoring Daemon. These changes mean that the monitoring solutions before 2.20 are incompatible with Armory 2.20.x (OSS 1.20.x) and later. If you are using one of those versions, see this page for 2.19.x and earlier.

Before you begin

You are familiar with Prometheus and Grafana
Armory Enterprise is deployed in the spinnaker namespace
Prometheus and Grafana are deployed in the monitoring namespace

Use `kube-prometheus` to create a monitoring stack

You can skip this section if you already have a monitoring stack.

A quick and easy way to configure a cluster monitoring solution is to use kube-prometheus. This project creates a monitoring stack that includes cluster monitoring with Prometheus and dashboards with Grafana.

To create the stack, follow the kube-prometheus quick start instructions beginning with the Compatibility Matrix section.

After you complete the instructions, you have pods running in the monitoring namespace:

% kubectl get pods --namespace monitoring

NAME                                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0                   2/2     Running   0          44s
alertmanager-main-1                   2/2     Running   0          44s
alertmanager-main-2                   2/2     Running   0          44s
grafana-77978cbbdc-x5rsq              1/1     Running   0          40s
kube-state-metrics-7f6d7b46b4-crzx2   3/3     Running   0          40s
node-exporter-nrc88                   2/2     Running   0          41s
prometheus-adapter-68698bc948-bl7p8   1/1     Running   0          40s
prometheus-k8s-0                      3/3     Running   1          39s
prometheus-k8s-1                      3/3     Running   1          39s
prometheus-operator-6685db5c6-qfpbj   1/1     Running   0          106s

Access the Prometheus web interface by using the kubectl port-forward command. If you want to expose this interface for others to use, create an ingress service. Make sure you nable security controls that follow Prometheus best practices.

% kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090 &

Navigate to http://localhost:9090/targets.

Grant Prometheus RBAC permissions

There are two steps to configure Prometheus to monitor Armory Enterprise:

Add permissions for Prometheus to talk to the Spinnaker namespace
Configure Prometheus to discover the Armory Enterprise endpoints

Add permissions for Prometheus by applying the following configuration to your cluster. You can learn more about this process on the Prometheus Operator homepage.

Example config:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  # name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  # name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
  name: prometheus
subjects:
  - kind: ServiceAccount
    # name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
    name: prometheus-k8s
    namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  # name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: monitoring
  # name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
  name: prometheus-k8s

Configure monitoring using the Observability Plugin

Caution

Before configuring monitoring, read and understand the following information about the security implications. If any of your services, typically Gate, are exposed to the open internet, there is a risk that you can publicly expose information. Armory recommends that you filter these paths at your edge layer in some manner. Be aware of any endpoints you expose. Spring boot exposes the health endpoint by default though with some restrictions on what information is exposed. When auth is enabled, Gate restricts access to the endpoints other than /health, preventing access to metric data.

For more information on Spring actuators, see the Monitoring and Management.

Armory recommends that you monitor your systems by using the Armory Observabililty Plugin. This is an open source solution for monitoring Armory Enterprise. The plugin supports the following:

Adding Prometheus (OpenMetrics) endpoints to Armory Enterprise pods (explained below).
Sending data to NewRelic (documented on the plugin page).

The Observability Plugin removes the service name from the metric. This is incompatible with the behavior of the open source Spinnaker Monitoring daemon system, which was the default monitoring solution in versions earlier than 2.20 and is now deprecated.

Install the plugin

To install the Observability plugin, add a plugin configuration to the profiles for your services:

Add it for all services in spinnaker-local.yml (Halyard installs) or the spinnaker profile section (Operator installs).
Add it to the services you want to monitor. This local profile should contain the following to enable Prometheus:

# These lines are spring-boot configuration to allow access to the metrics
# endpoints.  This plugin adds the "aop-prometheus" endpoint on the
# "<service>:<port>/aop-prometheus" path.

management:
  endpoints:
    web:
      # Read the security warning at the start of this section about what gets exposed!!
      exposure.include: health,info,aop-prometheus
spinnaker:
  extensibility:
    plugins:
      Armory.ObservabilityPlugin:
        enabled: true
        version: 1.1.3
        # This is the basic configuration for prometheus to be enabled
        config.metrics:
          prometheus:
            enabled: true
    repositories:
      armory-observability-plugin-releases:
        url: https://raw.githubusercontent.com/armory-plugins/armory-observability-plugin-releases/master/repositories.json

More options for management endpoints and the plugin are available on the Plugin readme.

Add the ServiceMonitor

Prometheus Operator uses a “ServiceMonitor” to add targets that get scraped for monitoring. The following example config shows how to monitor pods that are using the Observability Plugin to expose the aop-prometheus endpoint. Note that the example contains both the exclusion of certain services (such as Redis) and changes to the Gate endpoint to show you different options.

These are examples of potential configurations. Use them as a starting point. Armory recommends that you understand how they operate and find services. Adapt them to your environment.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: spin
    # This label is here to match the prometheus operator serviceMonitorSelector attribute
    # prometheus.prometheusSpec.serviceMonitorSelector. For more information, see
    # https://github.com/helm/charts/tree/master/stable/prometheus-operator
    release: prometheus-operator
  name: spinnaker-all-metrics
  namespace: spinnaker
spec:
  endpoints:
  - interval: 10s
    path: /aop-prometheus
  selector:
    matchExpressions:
    - key: cluster
      operator: NotIn
      values:
      - spin-gate
      - spin-gate-api
      - spin-gate-custom
      - spin-deck
      - spin-deck-custom
      - spin-redis
      - spin-terraformer
      - spin-dinghy
    matchLabels:
      app: spin

The example excludes Gate, the API service since Gate restricts access to the endpoints unless authenticated (excluding health).

The following example is for a service monitor for Gate on a different path and using TLS.

Once these are applied, you can port forward prometheus and validate that prometheus has discovered and scraped targets as appropriate.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: spinnaker-internal-metrics
  namespace: spinnaker
  labels:
    app: spin
    # This label is here to match the prometheus operator serviceMonitorSelector attribute
    # prometheus.prometheusSpec.serviceMonitorSelector
    # https://github.com/helm/charts/tree/master/stable/prometheus-operator
    release: prometheus-operator
spec:
  selector:
    matchLabels:
      cluster: spin-gate
  endpoints:
  - interval: 10s
    path: "/api/v1/aop-prometheus"
    # If Prometheus returns the error "http: server gave HTTP response to HTTPS client" then
    # replace scheme with targetPort:
    # Note that "port" is string only. "targetPort" is integer or string.
    # For example, targetPort: 8084
    scheme: "https"
    tlsConfig:
      insecureSkipVerify: true

Check for Armory Enterprise targets in Prometheus

After applying these changes, you should be able to see Armory Enterprise targets in Prometheus. It may take 3 to 5 minutes for this to show up depending on where Prometheus is in its config polling interval.

Prometheus Targets

Access Grafana

Configure port forwarding for Grafana:

$ kubectl --namespace monitoring port-forward svc/grafana 3000

Access the Grafana web interface via http://localhost:3000 and use the default Grafana username and password of admin:admin.

Add Armory dashboards to Grafana

Armory provides some sample dashboards (in JSON format) that you can import into Grafana as a starting point for metrics to graph for monitoring. Armory has additional dashboards that are available to Armory customers. You can skip this section if you are a Grafana expert.

To import the sample dashboards, perform the following steps:

Git clone this repo to your local workstation: (https://github.com/uneeq-oss/spinnaker-mixin)
Access the Grafana web interface (as shown above)
Navigate to Dashboards then Manage
Click on the Import button
Upload the one or more of the sample dashboard files from the repo you cloned

After importing the dashboards, you can explore graphs for each service by clicking on Dashboards > Manage > Spinnaker Kubernetes Details.

Grafana Dashboard

Available metrics by service

Disclaimer: the following tables may not contain every available metric for each service.

Clouddriver

Metric Name	Base Unit	Description
amazonClientProvider.rateLimitDelayMillis
authorization
aws.request.clientExecuteTime	milliseconds
aws.request.credentialsRequestTime	milliseconds
aws.request.httpClientReceiveResponseTime	milliseconds
aws.request.httpClientSendRequestTime	milliseconds
aws.request.httpRequestTime	milliseconds
aws.request.requestCount
aws.request.requestMarshallTime	milliseconds
aws.request.requestSigningTime	milliseconds
aws.request.responseProcessingTime	milliseconds
aws.request.retryPauseTime	milliseconds
aws.request.throttling
awsSdkClientSupplier.averageLoadPenalty
awsSdkClientSupplier.hitCount
awsSdkClientSupplier.loadExceptionCount
awsSdkClientSupplier.missRate
cats.sqlCache.evict.deleteOperations
cats.sqlCache.evict.itemCount
cats.sqlCache.evict.itemsDeleted
cats.sqlCache.get.itemCount
cats.sqlCache.get.relationshipsRequested
cats.sqlCache.get.requestedSize
cats.sqlCache.get.selectOperations
cats.sqlCache.merge.deleteOperations
cats.sqlCache.merge.itemCount
cats.sqlCache.merge.itemsStored
cats.sqlCache.merge.relationshipCount
cats.sqlCache.merge.relationshipsStored
cats.sqlCache.merge.selectOperations
cats.sqlCache.merge.writeOperations
cf.okhttp.requests	milliseconds	Timer of OkHttp operation
controller.invocations
controller.invocations.contentLength
controller.invocations.contentLength.summary
executionTime	milliseconds
health.kubernetes.errors
http.server.requests	milliseconds
jvm.buffer.count	buffers	An estimate of the number of buffers in the pool
jvm.gc.pause	milliseconds	Time spent in GC pause
jvm.memory.committed	bytes	The amount of memory in bytes that is committed for the Java virtual machine to use
jvm.memory.max	bytes	The maximum amount of memory in bytes that can be used for memory management
jvm.threads.daemon	threads	The current number of live daemon threads
jvm.threads.peak	threads	The peak live thread count since the Java virtual machine started or peak was reset
jvm.threads.states	threads	The current number of threads having BLOCKED state
kubernetes.api	milliseconds
logback.events	events	Number of debug level events that made it to the logs
onDemand_cache	milliseconds
onDemand_count
onDemand_error
onDemand_evict	milliseconds
onDemand_read	milliseconds
onDemand_store	milliseconds
onDemand_total	milliseconds
onDemand_transform	milliseconds
operations	milliseconds
orchestrations	milliseconds
process.files.max	files	The maximum file descriptor count
reservedInstances.surplusByAccountClassic
reservedInstances.surplusByAccountVpc
reservedInstances.surplusOverall
resilience4j.retry.calls		The number of failed calls after a retry attempt
sql.cacheCleanupAgent.dataTypeCleanupDuration	milliseconds
sql.cacheCleanupAgent.dataTypeRecordsDeleted
sql.healthProvider.invocations
sql.taskCleanupAgent.deleted
sql.taskCleanupAgent.timing	milliseconds
system.load.average.1m		The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
tasks
tasks
tomcat.sessions.active.current	sessions
tomcat.sessions.expired	sessions
tomcat.sessions.rejected	sessions

Echo

Metric Name	Base Unit	Description
aws.request.httpClientGetConnectionTime	milliseconds
controller.invocations
controller.invocations.contentLength
controller.invocations.contentLength.summary
echo.events.processed
echo.triggers.sync.executionTimeMillis	milliseconds
fiat.enabled
fiat.getPermission
fiat.legacyFallback.enabled
fiat.permissionsCache.evictions
fiat.permissionsCache.evictions-weight
fiat.permissionsCache.hits
fiat.permissionsCache.loads	milliseconds
fiat.permissionsCache.loads-failure
fiat.permissionsCache.loads-success
fiat.permissionsCache.misses
front50.lastPoll
front50.requests
http.server.requests	milliseconds
jvm.buffer.count	buffers	An estimate of the number of buffers in the pool
jvm.buffer.memory.used	bytes	An estimate of the memory that the Java virtual machine is using for this buffer pool
jvm.buffer.total.capacity	bytes	An estimate of the total capacity of the buffers in this pool
jvm.classes.loaded	classes	The number of classes that are currently loaded in the Java virtual machine
jvm.classes.unloaded	classes	The total number of classes unloaded since the Java virtual machine has started execution
jvm.gc.allocationRate
jvm.gc.live.data.size	bytes	Size of old generation memory pool after a full GC
jvm.gc.liveDataSize
jvm.gc.max.data.size	bytes	Max size of old generation memory pool
jvm.gc.maxDataSize
jvm.gc.memory.allocated	bytes	Incremented for an increase in the size of the young generation memory pool after one GC to before the next
jvm.gc.memory.promoted	bytes	Count of positive increases in the size of the old generation memory pool before GC to after GC
jvm.gc.pause	milliseconds	Time spent in GC pause
jvm.gc.promotionRate
jvm.memory.committed	bytes	The amount of memory in bytes that is committed for the Java virtual machine to use
jvm.memory.max	bytes	The maximum amount of memory in bytes that can be used for memory management
jvm.memory.used	bytes	The amount of used memory
jvm.threads.daemon	threads	The current number of live daemon threads
jvm.threads.live	threads	The current number of live threads including both daemon and non-daemon threads
jvm.threads.peak	threads	The peak live thread count since the Java virtual machine started or peak was reset
jvm.threads.states	threads	The current number of threads having NEW state
logback.events	events	Number of info level events that made it to the logs
okhttp.requests	milliseconds
orca.requests
orca.trigger.success
pipelines.triggered
process.cpu.usage		The recent cpu usage for the Java Virtual Machine process
process.files.max	files	The maximum file descriptor count
process.files.open	files	The open file descriptor count
process.start.time	milliseconds	Start time of the process since unix epoch.
process.uptime	milliseconds	The uptime of the Java virtual machine
quietPeriod.tests
resilience4j.circuitbreaker.buffered.calls		The number of buffered failed calls stored in the ring buffer
resilience4j.circuitbreaker.calls	milliseconds	Total number of calls which failed but the exception was ignored
resilience4j.circuitbreaker.failure.rate		The failure rate of the circuit breaker
resilience4j.circuitbreaker.slow.call.rate		The slow call of the circuit breaker
resilience4j.circuitbreaker.state		The states of the circuit breaker
system.cpu.count		The number of processors available to the Java virtual machine
system.load.average.1m		The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
tomcat.sessions.active.current	sessions
tomcat.sessions.active.max	sessions
tomcat.sessions.alive.max	milliseconds
tomcat.sessions.created	sessions
tomcat.sessions.expired	sessions
tomcat.sessions.rejected	sessions

Fiat

Metric Name	Base Unit	Description
controller.invocations
controller.invocations.contentLength
controller.invocations.contentLength.summary
fiat.getUserPermission
fiat.userRoles.syncAnonymous	milliseconds
fiat.userRoles.syncCount
fiat.userRoles.syncTime	milliseconds
fiat.userRoles.syncUsers	milliseconds
http.server.requests	milliseconds
jvm.buffer.count	buffers	An estimate of the number of buffers in the pool
jvm.buffer.memory.used	bytes	An estimate of the memory that the Java virtual machine is using for this buffer pool
jvm.buffer.total.capacity	bytes	An estimate of the total capacity of the buffers in this pool
jvm.classes.loaded	classes	The number of classes that are currently loaded in the Java virtual machine
jvm.classes.unloaded	classes	The total number of classes unloaded since the Java virtual machine has started execution
jvm.gc.allocationRate
jvm.gc.live.data.size	bytes	Size of old generation memory pool after a full GC
jvm.gc.liveDataSize
jvm.gc.max.data.size	bytes	Max size of old generation memory pool
jvm.gc.maxDataSize
jvm.gc.memory.allocated	bytes	Incremented for an increase in the size of the young generation memory pool after one GC to before the next
jvm.gc.memory.promoted	bytes	Count of positive increases in the size of the old generation memory pool before GC to after GC
jvm.gc.pause	milliseconds	Time spent in GC pause
jvm.gc.promotionRate
jvm.memory.committed	bytes	The amount of memory in bytes that is committed for the Java virtual machine to use
jvm.memory.max	bytes	The maximum amount of memory in bytes that can be used for memory management
jvm.memory.used	bytes	The amount of used memory
jvm.threads.daemon	threads	The current number of live daemon threads
jvm.threads.live	threads	The current number of live threads including both daemon and non-daemon threads
jvm.threads.peak	threads	The peak live thread count since the Java virtual machine started or peak was reset
jvm.threads.states	threads	The current number of threads having TERMINATED state
kork.lock.acquire
kork.lock.acquire.duration
kork.lock.heartbeat
kork.lock.release
logback.events	events	Number of debug level events that made it to the logs
okhttp.requests	milliseconds
permissionsRepository.get1.invocations
permissionsRepository.get1.timing
permissionsRepository.getAllById.invocations
permissionsRepository.getAllById.timing
permissionsRepository.put1.invocations
permissionsRepository.put1.timing
permissionsRepository.putAllById1.invocations
permissionsRepository.putAllById1.timing
process.cpu.usage		The recent cpu usage for the Java Virtual Machine process
process.files.max	files	The maximum file descriptor count
process.files.open	files	The open file descriptor count
process.start.time	milliseconds	Start time of the process since unix epoch.
process.uptime	milliseconds	The uptime of the Java virtual machine
redis.command.invocation.del
redis.command.invocation.eval
redis.command.invocation.get
redis.command.invocation.hgetAll
redis.command.invocation.hmset
redis.command.invocation.hscan
redis.command.invocation.pipelined
redis.command.invocation.rename
redis.command.invocation.sadd
redis.command.invocation.set
redis.command.invocation.sismember
redis.command.invocation.srem
redis.command.invocation.sscan
redis.command.invocation.time
redis.command.latency.del
redis.command.latency.eval	milliseconds
redis.command.latency.get	milliseconds
redis.command.latency.get
redis.command.latency.hgetAll
redis.command.latency.hmset
redis.command.latency.hscan
redis.command.latency.pipelined
redis.command.latency.rename
redis.command.latency.sadd
redis.command.latency.set
redis.command.latency.sismember
redis.command.latency.srem
redis.command.latency.sscan
redis.command.latency.time
redis.command.payloadSize.eval
redis.command.payloadSize.eval.summary
redis.command.payloadSize.sadd
redis.command.payloadSize.sadd.summary
redis.command.payloadSize.set
redis.command.payloadSize.set.summary
resilience4j.circuitbreaker.buffered.calls		The number of buffered failed calls stored in the ring buffer
resilience4j.circuitbreaker.calls	milliseconds
resilience4j.circuitbreaker.failure.rate		The failure rate of the circuit breaker
resilience4j.circuitbreaker.slow.call.rate		The slow call of the circuit breaker
resilience4j.circuitbreaker.state		The states of the circuit breaker
resilience4j.retry.calls		The number of failed calls after a retry attempt
system.cpu.count		The number of processors available to the Java virtual machine
system.cpu.usage		The recent cpu usage for the whole system
system.load.average.1m		The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
tomcat.sessions.active.current	sessions
tomcat.sessions.active.max	sessions
tomcat.sessions.alive.max	milliseconds
tomcat.sessions.created	sessions
tomcat.sessions.expired	sessions
tomcat.sessions.rejected	sessions

Front50

Metric Name	Base Unit	Description
aws.request.clientExecuteTime	milliseconds
aws.request.credentialsRequestTime	milliseconds
aws.request.httpClientGetConnectionTime	milliseconds
aws.request.httpClientReceiveResponseTime	milliseconds
aws.request.httpClientSendRequestTime	milliseconds
aws.request.httpRequestTime	milliseconds
aws.request.requestCount
aws.request.requestSigningTime	milliseconds
aws.request.responseProcessingTime	milliseconds
aws.request.retryPauseTime	milliseconds
controller.invocations
controller.invocations.contentLength
controller.invocations.contentLength.summary
fiat.enabled
fiat.getPermission
fiat.legacyFallback.enabled
fiat.permissionsCache.evictions
fiat.permissionsCache.evictions-weight
fiat.permissionsCache.hits
fiat.permissionsCache.loads	milliseconds
fiat.permissionsCache.loads-failure
fiat.permissionsCache.loads-success
fiat.permissionsCache.misses
http.server.requests	milliseconds
jvm.buffer.count	buffers	An estimate of the number of buffers in the pool
jvm.buffer.memory.used	bytes	An estimate of the memory that the Java virtual machine is using for this buffer pool
jvm.buffer.total.capacity	bytes	An estimate of the total capacity of the buffers in this pool
jvm.classes.loaded	classes	The number of classes that are currently loaded in the Java virtual machine
jvm.classes.unloaded	classes	The total number of classes unloaded since the Java virtual machine has started execution
jvm.gc.allocationRate
jvm.gc.live.data.size	bytes	Size of old generation memory pool after a full GC
jvm.gc.liveDataSize
jvm.gc.max.data.size	bytes	Max size of old generation memory pool
jvm.gc.maxDataSize
jvm.gc.memory.allocated	bytes	Incremented for an increase in the size of the young generation memory pool after one GC to before the next
jvm.gc.memory.promoted	bytes	Count of positive increases in the size of the old generation memory pool before GC to after GC
jvm.gc.pause	milliseconds	Time spent in GC pause
jvm.gc.promotionRate
jvm.memory.committed	bytes	The amount of memory in bytes that is committed for the Java virtual machine to use
jvm.memory.max	bytes	The maximum amount of memory in bytes that can be used for memory management
jvm.memory.used	bytes	The amount of used memory
jvm.threads.daemon	threads	The current number of live daemon threads
jvm.threads.live	threads	The current number of live threads including both daemon and non-daemon threads
jvm.threads.peak	threads	The peak live thread count since the Java virtual machine started or peak was reset
jvm.threads.states	threads	The current number of threads having WAITING state
logback.events	events	Number of error level events that made it to the logs
okhttp.requests	milliseconds
process.cpu.usage		The recent cpu usage for the Java Virtual Machine process
process.files.max	files	The maximum file descriptor count
process.files.open	files	The open file descriptor count
process.start.time	milliseconds	Start time of the process since unix epoch.
process.uptime	milliseconds	The uptime of the Java virtual machine
resilience4j.circuitbreaker.buffered.calls
resilience4j.circuitbreaker.calls	milliseconds
resilience4j.circuitbreaker.failure.rate		The failure rate of the circuit breaker
resilience4j.circuitbreaker.slow.call.rate		The slow call of the circuit breaker
resilience4j.circuitbreaker.slow.calls		The number of slow failed calls which were slower than a certain threshold
resilience4j.circuitbreaker.state		The states of the circuit breaker
storageServiceSupport.autoRefreshTime	milliseconds
storageServiceSupport.cacheAge
storageServiceSupport.cacheRefreshTime	milliseconds
storageServiceSupport.cacheSize
storageServiceSupport.mismatchedIds
storageServiceSupport.numAdded
storageServiceSupport.numRemoved
storageServiceSupport.numUpdated
storageServiceSupport.scheduledRefreshTime	milliseconds
system.cpu.count		The number of processors available to the Java virtual machine
system.cpu.usage		The recent cpu usage for the whole system
system.load.average.1m		The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
tomcat.sessions.active.current	sessions
tomcat.sessions.active.max	sessions
tomcat.sessions.alive.max	milliseconds
tomcat.sessions.created	sessions
tomcat.sessions.expired	sessions
tomcat.sessions.rejected	sessions

Gate

Metric Name	Base Unit	Description
controller.invocations
controller.invocations.contentLength
controller.invocations.contentLength.summary
fiat.enabled
fiat.getPermission
fiat.legacyFallback.enabled
fiat.login
fiat.permissionsCache.evictions
fiat.permissionsCache.evictions-weight
fiat.permissionsCache.hits
fiat.permissionsCache.loads	milliseconds
fiat.permissionsCache.loads-failure
fiat.permissionsCache.loads-success
fiat.permissionsCache.misses
http.server.requests	milliseconds
http.server.requests	milliseconds
http.server.requests	milliseconds
jvm.buffer.count	buffers	An estimate of the number of buffers in the pool
jvm.buffer.memory.used	bytes	An estimate of the memory that the Java virtual machine is using for this buffer pool
jvm.buffer.total.capacity	bytes	An estimate of the total capacity of the buffers in this pool
jvm.classes.loaded	classes	The number of classes that are currently loaded in the Java virtual machine
jvm.classes.unloaded	classes	The total number of classes unloaded since the Java virtual machine has started execution
jvm.gc.allocationRate
jvm.gc.live.data.size	bytes	Size of old generation memory pool after a full GC
jvm.gc.liveDataSize
jvm.gc.max.data.size	bytes	Max size of old generation memory pool
jvm.gc.maxDataSize
jvm.gc.memory.allocated	bytes	Incremented for an increase in the size of the young generation memory pool after one GC to before the next
jvm.gc.memory.promoted	bytes	Count of positive increases in the size of the old generation memory pool before GC to after GC
jvm.gc.pause	milliseconds	Time spent in GC pause
jvm.gc.promotionRate
jvm.memory.committed	bytes	The amount of memory in bytes that is committed for the Java virtual machine to use
jvm.memory.max	bytes	The maximum amount of memory in bytes that can be used for memory management
jvm.memory.used	bytes	The amount of used memory
jvm.threads.daemon	threads	The current number of live daemon threads
jvm.threads.live	threads	The current number of live threads including both daemon and non-daemon threads
jvm.threads.peak	threads	The peak live thread count since the Java virtual machine started or peak was reset
jvm.threads.peak	threads	The peak live thread count since the Java virtual machine started or peak was reset
jvm.threads.states	threads	The current number of threads having RUNNABLE state
logback.events	events	Number of error level events that made it to the logs
okhttp.requests	milliseconds
plugins.deckAssets.hits
plugins.deckCache.downloadDuration	milliseconds
plugins.deckCache.hits
plugins.deckCache.misses
plugins.deckCache.refreshDuration	milliseconds
plugins.deckCache.versions
process.cpu.usage		The recent cpu usage for the Java Virtual Machine process
process.files.max	files	The maximum file descriptor count
process.files.open	files	The open file descriptor count
process.start.time	milliseconds	Start time of the process since unix epoch.
process.uptime	milliseconds	The uptime of the Java virtual machine
system.cpu.count		The number of processors available to the Java virtual machine
system.cpu.usage		The recent cpu usage for the whole system
system.load.average.1m		The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
tomcat.sessions.active.current	sessions
tomcat.sessions.active.current	sessions
tomcat.sessions.active.max	sessions
tomcat.sessions.alive.max	milliseconds
tomcat.sessions.created	sessions
tomcat.sessions.expired	sessions
tomcat.sessions.rejected	sessions

Igor

Metric Name	Base Unit	Description
controller.invocations
controller.invocations.contentLength
controller.invocations.contentLength.summary
fiat.enabled
fiat.getPermission
fiat.legacyFallback.enabled
fiat.permissionsCache.evictions
fiat.permissionsCache.evictions-weight
fiat.permissionsCache.hits
fiat.permissionsCache.loads	milliseconds
fiat.permissionsCache.loads-failure
fiat.permissionsCache.loads-success
fiat.permissionsCache.misses
http.server.requests	milliseconds
jvm.buffer.count	buffers	An estimate of the number of buffers in the pool
jvm.buffer.memory.used	bytes	An estimate of the memory that the Java virtual machine is using for this buffer pool
jvm.classes.loaded	classes	The number of classes that are currently loaded in the Java virtual machine
jvm.classes.unloaded	classes	The total number of classes unloaded since the Java virtual machine has started execution
jvm.gc.allocationRate
jvm.gc.live.data.size	bytes	Size of old generation memory pool after a full GC
jvm.gc.liveDataSize
jvm.gc.max.data.size	bytes	Max size of old generation memory pool
jvm.gc.maxDataSize
jvm.gc.memory.allocated	bytes	Incremented for an increase in the size of the young generation memory pool after one GC to before the next
jvm.gc.pause	milliseconds	Time spent in GC pause
jvm.gc.promotionRate
jvm.memory.committed	bytes	The amount of memory in bytes that is committed for the Java virtual machine to use
jvm.memory.max	bytes	The maximum amount of memory in bytes that can be used for memory management
jvm.memory.used	bytes	The amount of used memory
jvm.threads.daemon	threads	The current number of live daemon threads
jvm.threads.live	threads	The current number of live threads including both daemon and non-daemon threads
jvm.threads.peak	threads	The peak live thread count since the Java virtual machine started or peak was reset
jvm.threads.states	threads	The current number of threads having NEW state
logback.events	events
okhttp.requests	milliseconds
pollingMonitor.docker.retrieveImagesByAccount	milliseconds
pollingMonitor.jenkins.retrieveProjects	milliseconds
pollingMonitor.pollTiming	milliseconds
process.cpu.usage		The recent cpu usage for the Java Virtual Machine process
process.files.max	files	The maximum file descriptor count
process.files.open	files	The open file descriptor count
process.start.time	milliseconds	Start time of the process since unix epoch.
process.uptime	milliseconds	The uptime of the Java virtual machine
resilience4j.circuitbreaker.buffered.calls		The number of buffered failed calls stored in the ring buffer
resilience4j.circuitbreaker.calls		Total number of not permitted calls
resilience4j.circuitbreaker.failure.rate		The failure rate of the circuit breaker
resilience4j.circuitbreaker.slow.call.rate		The slow call of the circuit breaker
resilience4j.circuitbreaker.state		The states of the circuit breaker
system.cpu.count		The number of processors available to the Java virtual machine
system.cpu.usage		The recent cpu usage for the whole system
system.load.average.1m		The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
tomcat.sessions.active.current	sessions
tomcat.sessions.alive.max	milliseconds
tomcat.sessions.created	sessions
tomcat.sessions.expired	sessions
tomcat.sessions.rejected	sessions

Kayenta

Metric Name	Base Unit	Description
canary.pipelines.initiated
canary.telemetry.query
controller.invocations	milliseconds
controller.invocations.contentLength
controller.invocations.contentLength.summary
executions.active
executions.completed
executions.started
http.server.requests	milliseconds
jvm.gc.allocationRate
jvm.gc.liveDataSize
jvm.gc.maxDataSize
jvm.gc.pause	milliseconds
jvm.gc.promotionRate
okhttp.requests	milliseconds
orca.task.result
queue.acknowledged.messages
queue.depth
queue.duplicate.messages
queue.last.poll.age
queue.last.retry.check.age
queue.message.lag	milliseconds
queue.orphaned.messages
queue.pushed.messages
queue.ready.depth
queue.unacked.depth
redis.command.invocation.exists
redis.command.invocation.hdel
redis.command.invocation.hget
redis.command.invocation.hgetAll
redis.command.invocation.hmset
redis.command.invocation.hset
redis.command.invocation.multi
redis.command.invocation.sadd
redis.command.invocation.srem
redis.command.invocation.zadd
redis.command.latency.exists
redis.command.latency.exists
redis.command.latency.hdel
redis.command.latency.hget
redis.command.latency.hgetAll
redis.command.latency.hmset	milliseconds
redis.command.latency.hset
redis.command.latency.multi
redis.command.latency.sadd
redis.command.latency.srem
redis.command.latency.zadd
redis.command.payloadSize.hmset
redis.command.payloadSize.hmset.summary
redis.command.payloadSize.hset
redis.command.payloadSize.hset.summary
redis.command.payloadSize.sadd
redis.command.payloadSize.sadd.summary
redis.command.payloadSize.srem
redis.command.payloadSize.srem.summary
redis.connectionPool.maxIdle
redis.connectionPool.minIdle
redis.connectionPool.numActive
redis.connectionPool.numIdle
redis.connectionPool.numWaiters
redis.executionRepository.store1.invocations
redis.executionRepository.store1.timing	milliseconds
redis.executionRepository.storeStage1.invocations
redis.executionRepository.storeStage1.timing
redis.executionRepository.updateStatus1.invocations
redis.executionRepository.updateStatus1.timing	milliseconds
retrieveById.redis.executionRepository.invocations
retrieveById.redis.executionRepository.timing
stage.invocations
stage.invocations.duration
task.completions.duration	milliseconds
task.completions.duration.withType	milliseconds
task.invocations.duration	milliseconds
task.invocations.duration.withType	milliseconds
threadpool.activeCount
threadpool.blockingQueueSize
threadpool.corePoolSize
threadpool.maximumPoolSize
threadpool.poolSize
tomcat.sessions.active.current	sessions
tomcat.sessions.active.max	sessions
tomcat.sessions.alive.max	milliseconds
tomcat.sessions.created	sessions
tomcat.sessions.expired	sessions
tomcat.sessions.rejected	sessions

Orca

Metric Name	Base Unit	Description
aws.request.httpClientGetConnectionTime	milliseconds
controller.invocations
controller.invocations.contentLength
controller.invocations.contentLength.summary
executions.active
executions.completed
executions.started
executions.totalTime	milliseconds
fiat.enabled
fiat.getPermission
fiat.legacyFallback.enabled
fiat.permissionsCache.loads	milliseconds
fiat.permissionsCache.loads-failure
http.server.requests	milliseconds
jdbc.connections.active
jdbc.connections.idle
jdbc.connections.max
jvm.gc.allocationRate
jvm.gc.pause	milliseconds
jvm.gc.promotionRate
mpt.requests
okhttp.requests	milliseconds
orca.task.result
queue.acknowledged.messages
queue.depth
queue.duplicate.messages
queue.last.poll.age
queue.message.notfound
queue.orphaned.messages
queue.pushed.messages
queue.retried.messages
queue.unacked.depth
redis.connectionPool.maxIdle
redis.connectionPool.numActive
redis.connectionPool.numIdle
resilience4j.retry.calls		The number of successful calls after a retry attempt
retrieveById.sql.executions.invocations
retrieveById.sql.executions.timing
sql.executions.addStage1.timing
sql.executions.cancel4.invocations
sql.executions.cancel4.timing
sql.executions.countActiveExecutions.invocations
sql.executions.countActiveExecutions.timing
sql.executions.handlesPartition1.invocations
sql.executions.handlesPartition1.timing	milliseconds
sql.executions.retrieveByCorrelationId2.timing
sql.executions.retrieveOrchestrationsForApplication3.timing
sql.executions.store1.timing
sql.executions.storeStage1.invocations
sql.executions.storeStage1.timing
sql.executions.updateStatus1.invocations
sql.executions.updateStatus1.timing
sql.healthProvider.invocations
sql.pool.default.connectionAcquiredTiming	milliseconds
sql.queueActivator.invocations
stage.invocations
stage.invocations.duration
task.completions.duration	milliseconds
task.completions.duration.withType	milliseconds
task.invocations.duration	milliseconds
task.invocations.duration.withType	milliseconds
tasks.serverGroupCacheForceRefresh
threadpool.activeCount
threadpool.blockingQueueSize
threadpool.corePoolSize
threadpool.corePoolSize
threadpool.maximumPoolSize
threadpool.poolSize
tomcat.sessions.active.current	sessions
tomcat.sessions.active.max	sessions
tomcat.sessions.alive.max	milliseconds
tomcat.sessions.rejected	sessions

Rosco

Metric Name	Base Unit	Description
bakesActive
bakesCompleted	milliseconds
controller.invocations
controller.invocations.contentLength
controller.invocations.contentLength.summary
http.server.requests	milliseconds
jvm.buffer.count	buffers	An estimate of the number of buffers in the pool
jvm.buffer.memory.used	bytes	An estimate of the memory that the Java virtual machine is using for this buffer pool
jvm.buffer.total.capacity	bytes	An estimate of the total capacity of the buffers in this pool
jvm.classes.loaded	classes	The number of classes that are currently loaded in the Java virtual machine
jvm.classes.unloaded	classes	The total number of classes unloaded since the Java virtual machine has started execution
jvm.gc.allocationRate
jvm.gc.live.data.size	bytes	Size of old generation memory pool after a full GC
jvm.gc.liveDataSize
jvm.gc.max.data.size	bytes	Max size of old generation memory pool
jvm.gc.maxDataSize
jvm.gc.memory.allocated	bytes	Incremented for an increase in the size of the young generation memory pool after one GC to before the next
jvm.gc.memory.promoted	bytes	Count of positive increases in the size of the old generation memory pool before GC to after GC
jvm.gc.pause	milliseconds	Time spent in GC pause
jvm.gc.promotionRate
jvm.memory.committed	bytes	The amount of memory in bytes that is committed for the Java virtual machine to use
jvm.memory.max	bytes	The maximum amount of memory in bytes that can be used for memory management
jvm.memory.used	bytes	The amount of used memory
jvm.threads.daemon	threads	The current number of live daemon threads
jvm.threads.live	threads	The current number of live threads including both daemon and non-daemon threads
jvm.threads.peak	threads	The peak live thread count since the Java virtual machine started or peak was reset
jvm.threads.states	threads
logback.events	events
okhttp.requests	milliseconds
process.cpu.usage		The recent cpu usage for the Java Virtual Machine process
process.files.max	files	The maximum file descriptor count
process.files.open	files	The open file descriptor count
process.start.time	milliseconds	Start time of the process since unix epoch.
process.uptime	milliseconds	The uptime of the Java virtual machine
system.cpu.count		The number of processors available to the Java virtual machine
system.cpu.usage		The recent cpu usage for the whole system
system.load.average.1m		The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
tomcat.sessions.active.current	sessions
tomcat.sessions.active.max	sessions
tomcat.sessions.alive.max	milliseconds
tomcat.sessions.created	sessions
tomcat.sessions.expired	sessions
tomcat.sessions.rejected	sessions

Feedback

Was this page helpful?

Thank you for letting us know!

Sorry to hear that. Please tell us how we can improve.

Last modified July 30, 2021: (75b5b8f)