Monitor Armory Enterprise with Prometheus
Overview
Armory recommends monitoring the health of Armory Enterprise in every production instance. This document describes how to set up a basic Prometheus and Grafana stack as well as enable monitoring for the Armory Enterprise services.
Additional Prometheus and Grafana configuration is necessary to make them production-grade, and this configuration is not a part of this document. Also note that monitoring the Pipelines as Code service (Dinghy) and the Terraform Integration service (Terraformer) are not discussed on this page.
Important
Armory 2.20 (OSS 1.20.x) introduced changes to metric names and the Monitoring Daemon. These changes mean that the monitoring solutions before 2.20 are incompatible with Armory 2.20.x (OSS 1.20.x) and later. If you are using one of those versions, see this page for 2.19.x and earlier.Before you begin
- You are familiar with Prometheus and Grafana
- Armory Enterprise is deployed in the
spinnaker
namespace - Prometheus and Grafana are deployed in the
monitoring
namespace
Use kube-prometheus
to create a monitoring stack
You can skip this section if you already have a monitoring stack.
A quick and easy way to configure a cluster monitoring solution is to use kube-prometheus
. This project creates a monitoring stack that includes cluster monitoring with Prometheus and dashboards with Grafana.
To create the stack, follow the kube-prometheus quick start instructions beginning with the Compatibility Matrix section.
After you complete the instructions, you have pods running in the monitoring
namespace:
% kubectl get pods --namespace monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 44s
alertmanager-main-1 2/2 Running 0 44s
alertmanager-main-2 2/2 Running 0 44s
grafana-77978cbbdc-x5rsq 1/1 Running 0 40s
kube-state-metrics-7f6d7b46b4-crzx2 3/3 Running 0 40s
node-exporter-nrc88 2/2 Running 0 41s
prometheus-adapter-68698bc948-bl7p8 1/1 Running 0 40s
prometheus-k8s-0 3/3 Running 1 39s
prometheus-k8s-1 3/3 Running 1 39s
prometheus-operator-6685db5c6-qfpbj 1/1 Running 0 106s
Access the Prometheus web interface by using the kubectl port-forward
command. If you want to expose this interface for others to use, create an ingress service. Make sure you nable security controls that follow Prometheus best practices.
% kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090 &
Navigate to http://localhost:9090/targets
.
Grant Prometheus RBAC permissions
There are two steps to configure Prometheus to monitor Armory Enterprise:
- Add permissions for Prometheus to talk to the Spinnaker namespace
- Configure Prometheus to discover the Armory Enterprise endpoints
Add permissions for Prometheus by applying the following configuration to your cluster. You can learn more about this process on the Prometheus Operator homepage.
Example config:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
# name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
# name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
name: prometheus
subjects:
- kind: ServiceAccount
# name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
name: prometheus-k8s
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
# name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: monitoring
# name can be either prometheus or prometheus-k8s depending on the version of the prometheus-operator
name: prometheus-k8s
Configure monitoring using the Observability Plugin
Caution
Before configuring monitoring, read and understand the following information about the security implications.
If any of your services, typically Gate, are exposed to the open internet, there is a
risk that you can publicly expose information. Armory recommends that you filter
these paths at your edge layer in some manner. Be aware of any endpoints you
expose. Spring boot exposes the health endpoint by default though with some
restrictions on what information is exposed. When auth is enabled, Gate restricts
access to the endpoints other than /health
, preventing access to metric data.
For more information on Spring actuators, see the Monitoring and Management.
Armory recommends that you monitor your systems by using the Armory Observabililty Plugin. This is an open source solution for monitoring Armory Enterprise. The plugin supports the following:
- Adding Prometheus (OpenMetrics) endpoints to Armory Enterprise pods (explained below).
- Sending data to NewRelic (documented on the plugin page).
The Observability Plugin removes the service name from the metric. This is incompatible with the behavior of the open source Spinnaker Monitoring daemon system, which was the default monitoring solution in versions earlier than 2.20 and is now deprecated.
Install the plugin
To install the Observability plugin, add a plugin configuration to the profiles for your services:
- Add it for all services in
spinnaker-local.yml
(Halyard installs) or thespinnaker
profile section (Operator installs). - Add it to the services you want to monitor. This local profile should contain the following to enable Prometheus:
# These lines are spring-boot configuration to allow access to the metrics
# endpoints. This plugin adds the "aop-prometheus" endpoint on the
# "<service>:<port>/aop-prometheus" path.
management:
endpoints:
web:
# Read the security warning at the start of this section about what gets exposed!!
exposure.include: health,info,aop-prometheus
spinnaker:
extensibility:
plugins:
Armory.ObservabilityPlugin:
enabled: true
version: 1.1.3
# This is the basic configuration for prometheus to be enabled
config.metrics:
prometheus:
enabled: true
repositories:
armory-observability-plugin-releases:
url: https://raw.githubusercontent.com/armory-plugins/armory-observability-plugin-releases/master/repositories.json
More options for management endpoints and the plugin are available on the Plugin readme.
Add the ServiceMonitor
Prometheus Operator uses a “ServiceMonitor” to add targets that get scraped for monitoring. The following
example config shows how to monitor pods that are using the Observability Plugin to expose the aop-prometheus
endpoint. Note that the example contains both the exclusion of certain services (such as Redis) and changes to the Gate endpoint to show you different options.
These are examples of potential configurations. Use them as a starting point. Armory recommends that you understand how they operate and find services. Adapt them to your environment.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: spin
# This label is here to match the prometheus operator serviceMonitorSelector attribute
# prometheus.prometheusSpec.serviceMonitorSelector. For more information, see
# https://github.com/helm/charts/tree/master/stable/prometheus-operator
release: prometheus-operator
name: spinnaker-all-metrics
namespace: spinnaker
spec:
endpoints:
- interval: 10s
path: /aop-prometheus
selector:
matchExpressions:
- key: cluster
operator: NotIn
values:
- spin-gate
- spin-gate-api
- spin-gate-custom
- spin-deck
- spin-deck-custom
- spin-redis
- spin-terraformer
- spin-dinghy
matchLabels:
app: spin
The example excludes Gate, the API service since Gate restricts access to the endpoints unless authenticated (excluding health).
The following example is for a service monitor for Gate on a different path and using TLS.
Once these are applied, you can port forward prometheus and validate that prometheus has discovered and scraped targets as appropriate.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: spinnaker-internal-metrics
namespace: spinnaker
labels:
app: spin
# This label is here to match the prometheus operator serviceMonitorSelector attribute
# prometheus.prometheusSpec.serviceMonitorSelector
# https://github.com/helm/charts/tree/master/stable/prometheus-operator
release: prometheus-operator
spec:
selector:
matchLabels:
cluster: spin-gate
endpoints:
- interval: 10s
path: "/api/v1/aop-prometheus"
# If Prometheus returns the error "http: server gave HTTP response to HTTPS client" then
# replace scheme with targetPort:
# Note that "port" is string only. "targetPort" is integer or string.
# For example, targetPort: 8084
scheme: "https"
tlsConfig:
insecureSkipVerify: true
Check for Armory Enterprise targets in Prometheus
After applying these changes, you should be able to see Armory Enterprise targets in Prometheus. It may take 3 to 5 minutes for this to show up depending on where Prometheus is in its config polling interval.
Access Grafana
Configure port forwarding for Grafana:
$ kubectl --namespace monitoring port-forward svc/grafana 3000
Access the Grafana web interface via http://localhost:3000 and use the default Grafana username and password of admin:admin
.
Add Armory dashboards to Grafana
Armory provides some sample dashboards (in JSON format) that you can import into Grafana as a starting point for metrics to graph for monitoring. Armory has additional dashboards that are available to Armory customers. You can skip this section if you are a Grafana expert.
To import the sample dashboards, perform the following steps:
- Git clone this repo to your local workstation: (https://github.com/uneeq-oss/spinnaker-mixin)
- Access the Grafana web interface (as shown above)
- Navigate to Dashboards then Manage
- Click on the Import button
- Upload the one or more of the sample dashboard files from the repo you cloned
After importing the dashboards, you can explore graphs for each service by clicking on Dashboards > Manage > Spinnaker Kubernetes Details.
Available metrics by service
Disclaimer: the following tables may not contain every available metric for each service.
Clouddriver
Metric Name | Base Unit | Description |
---|---|---|
amazonClientProvider.rateLimitDelayMillis | ||
authorization | ||
aws.request.clientExecuteTime | milliseconds | |
aws.request.credentialsRequestTime | milliseconds | |
aws.request.httpClientReceiveResponseTime | milliseconds | |
aws.request.httpClientSendRequestTime | milliseconds | |
aws.request.httpRequestTime | milliseconds | |
aws.request.requestCount | ||
aws.request.requestMarshallTime | milliseconds | |
aws.request.requestSigningTime | milliseconds | |
aws.request.responseProcessingTime | milliseconds | |
aws.request.retryPauseTime | milliseconds | |
aws.request.throttling | ||
awsSdkClientSupplier.averageLoadPenalty | ||
awsSdkClientSupplier.hitCount | ||
awsSdkClientSupplier.loadExceptionCount | ||
awsSdkClientSupplier.missRate | ||
cats.sqlCache.evict.deleteOperations | ||
cats.sqlCache.evict.itemCount | ||
cats.sqlCache.evict.itemsDeleted | ||
cats.sqlCache.get.itemCount | ||
cats.sqlCache.get.relationshipsRequested | ||
cats.sqlCache.get.requestedSize | ||
cats.sqlCache.get.selectOperations | ||
cats.sqlCache.merge.deleteOperations | ||
cats.sqlCache.merge.itemCount | ||
cats.sqlCache.merge.itemsStored | ||
cats.sqlCache.merge.relationshipCount | ||
cats.sqlCache.merge.relationshipsStored | ||
cats.sqlCache.merge.selectOperations | ||
cats.sqlCache.merge.writeOperations | ||
cf.okhttp.requests | milliseconds | Timer of OkHttp operation |
controller.invocations | ||
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
executionTime | milliseconds | |
health.kubernetes.errors | ||
http.server.requests | milliseconds | |
jvm.buffer.count | buffers | An estimate of the number of buffers in the pool |
jvm.gc.pause | milliseconds | Time spent in GC pause |
jvm.memory.committed | bytes | The amount of memory in bytes that is committed for the Java virtual machine to use |
jvm.memory.max | bytes | The maximum amount of memory in bytes that can be used for memory management |
jvm.threads.daemon | threads | The current number of live daemon threads |
jvm.threads.peak | threads | The peak live thread count since the Java virtual machine started or peak was reset |
jvm.threads.states | threads | The current number of threads having BLOCKED state |
kubernetes.api | milliseconds | |
logback.events | events | Number of debug level events that made it to the logs |
onDemand_cache | milliseconds | |
onDemand_count | ||
onDemand_error | ||
onDemand_evict | milliseconds | |
onDemand_read | milliseconds | |
onDemand_store | milliseconds | |
onDemand_total | milliseconds | |
onDemand_transform | milliseconds | |
operations | milliseconds | |
orchestrations | milliseconds | |
process.files.max | files | The maximum file descriptor count |
reservedInstances.surplusByAccountClassic | ||
reservedInstances.surplusByAccountVpc | ||
reservedInstances.surplusOverall | ||
resilience4j.retry.calls | The number of failed calls after a retry attempt | |
sql.cacheCleanupAgent.dataTypeCleanupDuration | milliseconds | |
sql.cacheCleanupAgent.dataTypeRecordsDeleted | ||
sql.healthProvider.invocations | ||
sql.taskCleanupAgent.deleted | ||
sql.taskCleanupAgent.timing | milliseconds | |
system.load.average.1m | The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time | |
tasks | ||
tasks | ||
tomcat.sessions.active.current | sessions | |
tomcat.sessions.expired | sessions | |
tomcat.sessions.rejected | sessions |
Echo
Metric Name | Base Unit | Description |
---|---|---|
aws.request.httpClientGetConnectionTime | milliseconds | |
controller.invocations | ||
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
echo.events.processed | ||
echo.triggers.sync.executionTimeMillis | milliseconds | |
fiat.enabled | ||
fiat.getPermission | ||
fiat.legacyFallback.enabled | ||
fiat.permissionsCache.evictions | ||
fiat.permissionsCache.evictions-weight | ||
fiat.permissionsCache.hits | ||
fiat.permissionsCache.loads | milliseconds | |
fiat.permissionsCache.loads-failure | ||
fiat.permissionsCache.loads-success | ||
fiat.permissionsCache.misses | ||
front50.lastPoll | ||
front50.requests | ||
http.server.requests | milliseconds | |
jvm.buffer.count | buffers | An estimate of the number of buffers in the pool |
jvm.buffer.memory.used | bytes | An estimate of the memory that the Java virtual machine is using for this buffer pool |
jvm.buffer.total.capacity | bytes | An estimate of the total capacity of the buffers in this pool |
jvm.classes.loaded | classes | The number of classes that are currently loaded in the Java virtual machine |
jvm.classes.unloaded | classes | The total number of classes unloaded since the Java virtual machine has started execution |
jvm.gc.allocationRate | ||
jvm.gc.live.data.size | bytes | Size of old generation memory pool after a full GC |
jvm.gc.liveDataSize | ||
jvm.gc.max.data.size | bytes | Max size of old generation memory pool |
jvm.gc.maxDataSize | ||
jvm.gc.memory.allocated | bytes | Incremented for an increase in the size of the young generation memory pool after one GC to before the next |
jvm.gc.memory.promoted | bytes | Count of positive increases in the size of the old generation memory pool before GC to after GC |
jvm.gc.pause | milliseconds | Time spent in GC pause |
jvm.gc.promotionRate | ||
jvm.memory.committed | bytes | The amount of memory in bytes that is committed for the Java virtual machine to use |
jvm.memory.max | bytes | The maximum amount of memory in bytes that can be used for memory management |
jvm.memory.used | bytes | The amount of used memory |
jvm.threads.daemon | threads | The current number of live daemon threads |
jvm.threads.live | threads | The current number of live threads including both daemon and non-daemon threads |
jvm.threads.peak | threads | The peak live thread count since the Java virtual machine started or peak was reset |
jvm.threads.states | threads | The current number of threads having NEW state |
logback.events | events | Number of info level events that made it to the logs |
okhttp.requests | milliseconds | |
orca.requests | ||
orca.trigger.success | ||
pipelines.triggered | ||
process.cpu.usage | The recent cpu usage for the Java Virtual Machine process | |
process.files.max | files | The maximum file descriptor count |
process.files.open | files | The open file descriptor count |
process.start.time | milliseconds | Start time of the process since unix epoch. |
process.uptime | milliseconds | The uptime of the Java virtual machine |
quietPeriod.tests | ||
resilience4j.circuitbreaker.buffered.calls | The number of buffered failed calls stored in the ring buffer | |
resilience4j.circuitbreaker.calls | milliseconds | Total number of calls which failed but the exception was ignored |
resilience4j.circuitbreaker.failure.rate | The failure rate of the circuit breaker | |
resilience4j.circuitbreaker.slow.call.rate | The slow call of the circuit breaker | |
resilience4j.circuitbreaker.state | The states of the circuit breaker | |
system.cpu.count | The number of processors available to the Java virtual machine | |
system.load.average.1m | The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time | |
tomcat.sessions.active.current | sessions | |
tomcat.sessions.active.max | sessions | |
tomcat.sessions.alive.max | milliseconds | |
tomcat.sessions.created | sessions | |
tomcat.sessions.expired | sessions | |
tomcat.sessions.rejected | sessions |
Fiat
Metric Name | Base Unit | Description |
---|---|---|
controller.invocations | ||
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
fiat.getUserPermission | ||
fiat.userRoles.syncAnonymous | milliseconds | |
fiat.userRoles.syncCount | ||
fiat.userRoles.syncTime | milliseconds | |
fiat.userRoles.syncUsers | milliseconds | |
http.server.requests | milliseconds | |
jvm.buffer.count | buffers | An estimate of the number of buffers in the pool |
jvm.buffer.memory.used | bytes | An estimate of the memory that the Java virtual machine is using for this buffer pool |
jvm.buffer.total.capacity | bytes | An estimate of the total capacity of the buffers in this pool |
jvm.classes.loaded | classes | The number of classes that are currently loaded in the Java virtual machine |
jvm.classes.unloaded | classes | The total number of classes unloaded since the Java virtual machine has started execution |
jvm.gc.allocationRate | ||
jvm.gc.live.data.size | bytes | Size of old generation memory pool after a full GC |
jvm.gc.liveDataSize | ||
jvm.gc.max.data.size | bytes | Max size of old generation memory pool |
jvm.gc.maxDataSize | ||
jvm.gc.memory.allocated | bytes | Incremented for an increase in the size of the young generation memory pool after one GC to before the next |
jvm.gc.memory.promoted | bytes | Count of positive increases in the size of the old generation memory pool before GC to after GC |
jvm.gc.pause | milliseconds | Time spent in GC pause |
jvm.gc.promotionRate | ||
jvm.memory.committed | bytes | The amount of memory in bytes that is committed for the Java virtual machine to use |
jvm.memory.max | bytes | The maximum amount of memory in bytes that can be used for memory management |
jvm.memory.used | bytes | The amount of used memory |
jvm.threads.daemon | threads | The current number of live daemon threads |
jvm.threads.live | threads | The current number of live threads including both daemon and non-daemon threads |
jvm.threads.peak | threads | The peak live thread count since the Java virtual machine started or peak was reset |
jvm.threads.states | threads | The current number of threads having TERMINATED state |
kork.lock.acquire | ||
kork.lock.acquire.duration | ||
kork.lock.heartbeat | ||
kork.lock.release | ||
logback.events | events | Number of debug level events that made it to the logs |
okhttp.requests | milliseconds | |
permissionsRepository.get1.invocations | ||
permissionsRepository.get1.timing | ||
permissionsRepository.getAllById.invocations | ||
permissionsRepository.getAllById.timing | ||
permissionsRepository.put1.invocations | ||
permissionsRepository.put1.timing | ||
permissionsRepository.putAllById1.invocations | ||
permissionsRepository.putAllById1.timing | ||
process.cpu.usage | The recent cpu usage for the Java Virtual Machine process | |
process.files.max | files | The maximum file descriptor count |
process.files.open | files | The open file descriptor count |
process.start.time | milliseconds | Start time of the process since unix epoch. |
process.uptime | milliseconds | The uptime of the Java virtual machine |
redis.command.invocation.del | ||
redis.command.invocation.eval | ||
redis.command.invocation.get | ||
redis.command.invocation.hgetAll | ||
redis.command.invocation.hmset | ||
redis.command.invocation.hscan | ||
redis.command.invocation.pipelined | ||
redis.command.invocation.rename | ||
redis.command.invocation.sadd | ||
redis.command.invocation.set | ||
redis.command.invocation.sismember | ||
redis.command.invocation.srem | ||
redis.command.invocation.sscan | ||
redis.command.invocation.time | ||
redis.command.latency.del | ||
redis.command.latency.eval | milliseconds | |
redis.command.latency.get | milliseconds | |
redis.command.latency.get | ||
redis.command.latency.hgetAll | ||
redis.command.latency.hmset | ||
redis.command.latency.hscan | ||
redis.command.latency.pipelined | ||
redis.command.latency.rename | ||
redis.command.latency.sadd | ||
redis.command.latency.set | ||
redis.command.latency.sismember | ||
redis.command.latency.srem | ||
redis.command.latency.sscan | ||
redis.command.latency.time | ||
redis.command.payloadSize.eval | ||
redis.command.payloadSize.eval.summary | ||
redis.command.payloadSize.sadd | ||
redis.command.payloadSize.sadd.summary | ||
redis.command.payloadSize.set | ||
redis.command.payloadSize.set.summary | ||
resilience4j.circuitbreaker.buffered.calls | The number of buffered failed calls stored in the ring buffer | |
resilience4j.circuitbreaker.calls | milliseconds | |
resilience4j.circuitbreaker.failure.rate | The failure rate of the circuit breaker | |
resilience4j.circuitbreaker.slow.call.rate | The slow call of the circuit breaker | |
resilience4j.circuitbreaker.state | The states of the circuit breaker | |
resilience4j.retry.calls | The number of failed calls after a retry attempt | |
system.cpu.count | The number of processors available to the Java virtual machine | |
system.cpu.usage | The recent cpu usage for the whole system | |
system.load.average.1m | The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time | |
tomcat.sessions.active.current | sessions | |
tomcat.sessions.active.max | sessions | |
tomcat.sessions.alive.max | milliseconds | |
tomcat.sessions.created | sessions | |
tomcat.sessions.expired | sessions | |
tomcat.sessions.rejected | sessions |
Front50
Metric Name | Base Unit | Description |
---|---|---|
aws.request.clientExecuteTime | milliseconds | |
aws.request.credentialsRequestTime | milliseconds | |
aws.request.httpClientGetConnectionTime | milliseconds | |
aws.request.httpClientReceiveResponseTime | milliseconds | |
aws.request.httpClientSendRequestTime | milliseconds | |
aws.request.httpRequestTime | milliseconds | |
aws.request.requestCount | ||
aws.request.requestSigningTime | milliseconds | |
aws.request.responseProcessingTime | milliseconds | |
aws.request.retryPauseTime | milliseconds | |
controller.invocations | ||
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
fiat.enabled | ||
fiat.getPermission | ||
fiat.legacyFallback.enabled | ||
fiat.permissionsCache.evictions | ||
fiat.permissionsCache.evictions-weight | ||
fiat.permissionsCache.hits | ||
fiat.permissionsCache.loads | milliseconds | |
fiat.permissionsCache.loads-failure | ||
fiat.permissionsCache.loads-success | ||
fiat.permissionsCache.misses | ||
http.server.requests | milliseconds | |
jvm.buffer.count | buffers | An estimate of the number of buffers in the pool |
jvm.buffer.memory.used | bytes | An estimate of the memory that the Java virtual machine is using for this buffer pool |
jvm.buffer.total.capacity | bytes | An estimate of the total capacity of the buffers in this pool |
jvm.classes.loaded | classes | The number of classes that are currently loaded in the Java virtual machine |
jvm.classes.unloaded | classes | The total number of classes unloaded since the Java virtual machine has started execution |
jvm.gc.allocationRate | ||
jvm.gc.live.data.size | bytes | Size of old generation memory pool after a full GC |
jvm.gc.liveDataSize | ||
jvm.gc.max.data.size | bytes | Max size of old generation memory pool |
jvm.gc.maxDataSize | ||
jvm.gc.memory.allocated | bytes | Incremented for an increase in the size of the young generation memory pool after one GC to before the next |
jvm.gc.memory.promoted | bytes | Count of positive increases in the size of the old generation memory pool before GC to after GC |
jvm.gc.pause | milliseconds | Time spent in GC pause |
jvm.gc.promotionRate | ||
jvm.memory.committed | bytes | The amount of memory in bytes that is committed for the Java virtual machine to use |
jvm.memory.max | bytes | The maximum amount of memory in bytes that can be used for memory management |
jvm.memory.used | bytes | The amount of used memory |
jvm.threads.daemon | threads | The current number of live daemon threads |
jvm.threads.live | threads | The current number of live threads including both daemon and non-daemon threads |
jvm.threads.peak | threads | The peak live thread count since the Java virtual machine started or peak was reset |
jvm.threads.states | threads | The current number of threads having WAITING state |
logback.events | events | Number of error level events that made it to the logs |
okhttp.requests | milliseconds | |
process.cpu.usage | The recent cpu usage for the Java Virtual Machine process | |
process.files.max | files | The maximum file descriptor count |
process.files.open | files | The open file descriptor count |
process.start.time | milliseconds | Start time of the process since unix epoch. |
process.uptime | milliseconds | The uptime of the Java virtual machine |
resilience4j.circuitbreaker.buffered.calls | ||
resilience4j.circuitbreaker.calls | milliseconds | |
resilience4j.circuitbreaker.failure.rate | The failure rate of the circuit breaker | |
resilience4j.circuitbreaker.slow.call.rate | The slow call of the circuit breaker | |
resilience4j.circuitbreaker.slow.calls | The number of slow failed calls which were slower than a certain threshold | |
resilience4j.circuitbreaker.state | The states of the circuit breaker | |
storageServiceSupport.autoRefreshTime | milliseconds | |
storageServiceSupport.cacheAge | ||
storageServiceSupport.cacheRefreshTime | milliseconds | |
storageServiceSupport.cacheSize | ||
storageServiceSupport.mismatchedIds | ||
storageServiceSupport.numAdded | ||
storageServiceSupport.numRemoved | ||
storageServiceSupport.numUpdated | ||
storageServiceSupport.scheduledRefreshTime | milliseconds | |
system.cpu.count | The number of processors available to the Java virtual machine | |
system.cpu.usage | The recent cpu usage for the whole system | |
system.load.average.1m | The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time | |
tomcat.sessions.active.current | sessions | |
tomcat.sessions.active.max | sessions | |
tomcat.sessions.alive.max | milliseconds | |
tomcat.sessions.created | sessions | |
tomcat.sessions.expired | sessions | |
tomcat.sessions.rejected | sessions |
Gate
Metric Name | Base Unit | Description |
---|---|---|
controller.invocations | ||
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
fiat.enabled | ||
fiat.getPermission | ||
fiat.legacyFallback.enabled | ||
fiat.login | ||
fiat.permissionsCache.evictions | ||
fiat.permissionsCache.evictions-weight | ||
fiat.permissionsCache.hits | ||
fiat.permissionsCache.loads | milliseconds | |
fiat.permissionsCache.loads-failure | ||
fiat.permissionsCache.loads-success | ||
fiat.permissionsCache.misses | ||
http.server.requests | milliseconds | |
http.server.requests | milliseconds | |
http.server.requests | milliseconds | |
jvm.buffer.count | buffers | An estimate of the number of buffers in the pool |
jvm.buffer.memory.used | bytes | An estimate of the memory that the Java virtual machine is using for this buffer pool |
jvm.buffer.total.capacity | bytes | An estimate of the total capacity of the buffers in this pool |
jvm.classes.loaded | classes | The number of classes that are currently loaded in the Java virtual machine |
jvm.classes.unloaded | classes | The total number of classes unloaded since the Java virtual machine has started execution |
jvm.gc.allocationRate | ||
jvm.gc.live.data.size | bytes | Size of old generation memory pool after a full GC |
jvm.gc.liveDataSize | ||
jvm.gc.max.data.size | bytes | Max size of old generation memory pool |
jvm.gc.maxDataSize | ||
jvm.gc.memory.allocated | bytes | Incremented for an increase in the size of the young generation memory pool after one GC to before the next |
jvm.gc.memory.promoted | bytes | Count of positive increases in the size of the old generation memory pool before GC to after GC |
jvm.gc.pause | milliseconds | Time spent in GC pause |
jvm.gc.promotionRate | ||
jvm.memory.committed | bytes | The amount of memory in bytes that is committed for the Java virtual machine to use |
jvm.memory.max | bytes | The maximum amount of memory in bytes that can be used for memory management |
jvm.memory.used | bytes | The amount of used memory |
jvm.threads.daemon | threads | The current number of live daemon threads |
jvm.threads.live | threads | The current number of live threads including both daemon and non-daemon threads |
jvm.threads.peak | threads | The peak live thread count since the Java virtual machine started or peak was reset |
jvm.threads.peak | threads | The peak live thread count since the Java virtual machine started or peak was reset |
jvm.threads.states | threads | The current number of threads having RUNNABLE state |
logback.events | events | Number of error level events that made it to the logs |
okhttp.requests | milliseconds | |
plugins.deckAssets.hits | ||
plugins.deckCache.downloadDuration | milliseconds | |
plugins.deckCache.hits | ||
plugins.deckCache.misses | ||
plugins.deckCache.refreshDuration | milliseconds | |
plugins.deckCache.versions | ||
process.cpu.usage | The recent cpu usage for the Java Virtual Machine process | |
process.files.max | files | The maximum file descriptor count |
process.files.open | files | The open file descriptor count |
process.start.time | milliseconds | Start time of the process since unix epoch. |
process.uptime | milliseconds | The uptime of the Java virtual machine |
system.cpu.count | The number of processors available to the Java virtual machine | |
system.cpu.usage | The recent cpu usage for the whole system | |
system.load.average.1m | The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time | |
tomcat.sessions.active.current | sessions | |
tomcat.sessions.active.current | sessions | |
tomcat.sessions.active.max | sessions | |
tomcat.sessions.alive.max | milliseconds | |
tomcat.sessions.created | sessions | |
tomcat.sessions.expired | sessions | |
tomcat.sessions.rejected | sessions |
Igor
Metric Name | Base Unit | Description |
---|---|---|
controller.invocations | ||
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
fiat.enabled | ||
fiat.getPermission | ||
fiat.legacyFallback.enabled | ||
fiat.permissionsCache.evictions | ||
fiat.permissionsCache.evictions-weight | ||
fiat.permissionsCache.hits | ||
fiat.permissionsCache.loads | milliseconds | |
fiat.permissionsCache.loads-failure | ||
fiat.permissionsCache.loads-success | ||
fiat.permissionsCache.misses | ||
http.server.requests | milliseconds | |
jvm.buffer.count | buffers | An estimate of the number of buffers in the pool |
jvm.buffer.memory.used | bytes | An estimate of the memory that the Java virtual machine is using for this buffer pool |
jvm.classes.loaded | classes | The number of classes that are currently loaded in the Java virtual machine |
jvm.classes.unloaded | classes | The total number of classes unloaded since the Java virtual machine has started execution |
jvm.gc.allocationRate | ||
jvm.gc.live.data.size | bytes | Size of old generation memory pool after a full GC |
jvm.gc.liveDataSize | ||
jvm.gc.max.data.size | bytes | Max size of old generation memory pool |
jvm.gc.maxDataSize | ||
jvm.gc.memory.allocated | bytes | Incremented for an increase in the size of the young generation memory pool after one GC to before the next |
jvm.gc.pause | milliseconds | Time spent in GC pause |
jvm.gc.promotionRate | ||
jvm.memory.committed | bytes | The amount of memory in bytes that is committed for the Java virtual machine to use |
jvm.memory.max | bytes | The maximum amount of memory in bytes that can be used for memory management |
jvm.memory.used | bytes | The amount of used memory |
jvm.threads.daemon | threads | The current number of live daemon threads |
jvm.threads.live | threads | The current number of live threads including both daemon and non-daemon threads |
jvm.threads.peak | threads | The peak live thread count since the Java virtual machine started or peak was reset |
jvm.threads.states | threads | The current number of threads having NEW state |
logback.events | events | |
okhttp.requests | milliseconds | |
pollingMonitor.docker.retrieveImagesByAccount | milliseconds | |
pollingMonitor.jenkins.retrieveProjects | milliseconds | |
pollingMonitor.pollTiming | milliseconds | |
process.cpu.usage | The recent cpu usage for the Java Virtual Machine process | |
process.files.max | files | The maximum file descriptor count |
process.files.open | files | The open file descriptor count |
process.start.time | milliseconds | Start time of the process since unix epoch. |
process.uptime | milliseconds | The uptime of the Java virtual machine |
resilience4j.circuitbreaker.buffered.calls | The number of buffered failed calls stored in the ring buffer | |
resilience4j.circuitbreaker.calls | Total number of not permitted calls | |
resilience4j.circuitbreaker.failure.rate | The failure rate of the circuit breaker | |
resilience4j.circuitbreaker.slow.call.rate | The slow call of the circuit breaker | |
resilience4j.circuitbreaker.state | The states of the circuit breaker | |
system.cpu.count | The number of processors available to the Java virtual machine | |
system.cpu.usage | The recent cpu usage for the whole system | |
system.load.average.1m | The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time | |
tomcat.sessions.active.current | sessions | |
tomcat.sessions.alive.max | milliseconds | |
tomcat.sessions.created | sessions | |
tomcat.sessions.expired | sessions | |
tomcat.sessions.rejected | sessions |
Kayenta
Metric Name | Base Unit | Description |
---|---|---|
canary.pipelines.initiated | ||
canary.telemetry.query | ||
controller.invocations | milliseconds | |
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
executions.active | ||
executions.completed | ||
executions.started | ||
http.server.requests | milliseconds | |
jvm.gc.allocationRate | ||
jvm.gc.liveDataSize | ||
jvm.gc.maxDataSize | ||
jvm.gc.pause | milliseconds | |
jvm.gc.promotionRate | ||
okhttp.requests | milliseconds | |
orca.task.result | ||
queue.acknowledged.messages | ||
queue.depth | ||
queue.duplicate.messages | ||
queue.last.poll.age | ||
queue.last.retry.check.age | ||
queue.message.lag | milliseconds | |
queue.orphaned.messages | ||
queue.pushed.messages | ||
queue.ready.depth | ||
queue.unacked.depth | ||
redis.command.invocation.exists | ||
redis.command.invocation.hdel | ||
redis.command.invocation.hget | ||
redis.command.invocation.hgetAll | ||
redis.command.invocation.hmset | ||
redis.command.invocation.hset | ||
redis.command.invocation.multi | ||
redis.command.invocation.sadd | ||
redis.command.invocation.srem | ||
redis.command.invocation.zadd | ||
redis.command.latency.exists | ||
redis.command.latency.exists | ||
redis.command.latency.hdel | ||
redis.command.latency.hget | ||
redis.command.latency.hgetAll | ||
redis.command.latency.hmset | milliseconds | |
redis.command.latency.hset | ||
redis.command.latency.multi | ||
redis.command.latency.sadd | ||
redis.command.latency.srem | ||
redis.command.latency.zadd | ||
redis.command.payloadSize.hmset | ||
redis.command.payloadSize.hmset.summary | ||
redis.command.payloadSize.hset | ||
redis.command.payloadSize.hset.summary | ||
redis.command.payloadSize.sadd | ||
redis.command.payloadSize.sadd.summary | ||
redis.command.payloadSize.srem | ||
redis.command.payloadSize.srem.summary | ||
redis.connectionPool.maxIdle | ||
redis.connectionPool.minIdle | ||
redis.connectionPool.numActive | ||
redis.connectionPool.numIdle | ||
redis.connectionPool.numWaiters | ||
redis.executionRepository.store1.invocations | ||
redis.executionRepository.store1.timing | milliseconds | |
redis.executionRepository.storeStage1.invocations | ||
redis.executionRepository.storeStage1.timing | ||
redis.executionRepository.updateStatus1.invocations | ||
redis.executionRepository.updateStatus1.timing | milliseconds | |
retrieveById.redis.executionRepository.invocations | ||
retrieveById.redis.executionRepository.timing | ||
stage.invocations | ||
stage.invocations.duration | ||
task.completions.duration | milliseconds | |
task.completions.duration.withType | milliseconds | |
task.invocations.duration | milliseconds | |
task.invocations.duration.withType | milliseconds | |
threadpool.activeCount | ||
threadpool.blockingQueueSize | ||
threadpool.corePoolSize | ||
threadpool.maximumPoolSize | ||
threadpool.poolSize | ||
tomcat.sessions.active.current | sessions | |
tomcat.sessions.active.max | sessions | |
tomcat.sessions.alive.max | milliseconds | |
tomcat.sessions.created | sessions | |
tomcat.sessions.expired | sessions | |
tomcat.sessions.rejected | sessions |
Orca
Metric Name | Base Unit | Description |
---|---|---|
aws.request.httpClientGetConnectionTime | milliseconds | |
controller.invocations | ||
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
executions.active | ||
executions.completed | ||
executions.started | ||
executions.totalTime | milliseconds | |
fiat.enabled | ||
fiat.getPermission | ||
fiat.legacyFallback.enabled | ||
fiat.permissionsCache.loads | milliseconds | |
fiat.permissionsCache.loads-failure | ||
http.server.requests | milliseconds | |
jdbc.connections.active | ||
jdbc.connections.idle | ||
jdbc.connections.max | ||
jvm.gc.allocationRate | ||
jvm.gc.pause | milliseconds | |
jvm.gc.promotionRate | ||
mpt.requests | ||
okhttp.requests | milliseconds | |
orca.task.result | ||
queue.acknowledged.messages | ||
queue.depth | ||
queue.duplicate.messages | ||
queue.last.poll.age | ||
queue.message.notfound | ||
queue.orphaned.messages | ||
queue.pushed.messages | ||
queue.retried.messages | ||
queue.unacked.depth | ||
redis.connectionPool.maxIdle | ||
redis.connectionPool.numActive | ||
redis.connectionPool.numIdle | ||
resilience4j.retry.calls | The number of successful calls after a retry attempt | |
retrieveById.sql.executions.invocations | ||
retrieveById.sql.executions.timing | ||
sql.executions.addStage1.timing | ||
sql.executions.cancel4.invocations | ||
sql.executions.cancel4.timing | ||
sql.executions.countActiveExecutions.invocations | ||
sql.executions.countActiveExecutions.timing | ||
sql.executions.handlesPartition1.invocations | ||
sql.executions.handlesPartition1.timing | milliseconds | |
sql.executions.retrieveByCorrelationId2.timing | ||
sql.executions.retrieveOrchestrationsForApplication3.timing | ||
sql.executions.store1.timing | ||
sql.executions.storeStage1.invocations | ||
sql.executions.storeStage1.timing | ||
sql.executions.updateStatus1.invocations | ||
sql.executions.updateStatus1.timing | ||
sql.healthProvider.invocations | ||
sql.pool.default.connectionAcquiredTiming | milliseconds | |
sql.queueActivator.invocations | ||
stage.invocations | ||
stage.invocations.duration | ||
task.completions.duration | milliseconds | |
task.completions.duration.withType | milliseconds | |
task.invocations.duration | milliseconds | |
task.invocations.duration.withType | milliseconds | |
tasks.serverGroupCacheForceRefresh | ||
threadpool.activeCount | ||
threadpool.blockingQueueSize | ||
threadpool.corePoolSize | ||
threadpool.corePoolSize | ||
threadpool.maximumPoolSize | ||
threadpool.poolSize | ||
tomcat.sessions.active.current | sessions | |
tomcat.sessions.active.max | sessions | |
tomcat.sessions.alive.max | milliseconds | |
tomcat.sessions.rejected | sessions |
Rosco
Metric Name | Base Unit | Description |
---|---|---|
bakesActive | ||
bakesCompleted | milliseconds | |
controller.invocations | ||
controller.invocations.contentLength | ||
controller.invocations.contentLength.summary | ||
http.server.requests | milliseconds | |
jvm.buffer.count | buffers | An estimate of the number of buffers in the pool |
jvm.buffer.memory.used | bytes | An estimate of the memory that the Java virtual machine is using for this buffer pool |
jvm.buffer.total.capacity | bytes | An estimate of the total capacity of the buffers in this pool |
jvm.classes.loaded | classes | The number of classes that are currently loaded in the Java virtual machine |
jvm.classes.unloaded | classes | The total number of classes unloaded since the Java virtual machine has started execution |
jvm.gc.allocationRate | ||
jvm.gc.live.data.size | bytes | Size of old generation memory pool after a full GC |
jvm.gc.liveDataSize | ||
jvm.gc.max.data.size | bytes | Max size of old generation memory pool |
jvm.gc.maxDataSize | ||
jvm.gc.memory.allocated | bytes | Incremented for an increase in the size of the young generation memory pool after one GC to before the next |
jvm.gc.memory.promoted | bytes | Count of positive increases in the size of the old generation memory pool before GC to after GC |
jvm.gc.pause | milliseconds | Time spent in GC pause |
jvm.gc.promotionRate | ||
jvm.memory.committed | bytes | The amount of memory in bytes that is committed for the Java virtual machine to use |
jvm.memory.max | bytes | The maximum amount of memory in bytes that can be used for memory management |
jvm.memory.used | bytes | The amount of used memory |
jvm.threads.daemon | threads | The current number of live daemon threads |
jvm.threads.live | threads | The current number of live threads including both daemon and non-daemon threads |
jvm.threads.peak | threads | The peak live thread count since the Java virtual machine started or peak was reset |
jvm.threads.states | threads | |
logback.events | events | |
okhttp.requests | milliseconds | |
process.cpu.usage | The recent cpu usage for the Java Virtual Machine process | |
process.files.max | files | The maximum file descriptor count |
process.files.open | files | The open file descriptor count |
process.start.time | milliseconds | Start time of the process since unix epoch. |
process.uptime | milliseconds | The uptime of the Java virtual machine |
system.cpu.count | The number of processors available to the Java virtual machine | |
system.cpu.usage | The recent cpu usage for the whole system | |
system.load.average.1m | The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time | |
tomcat.sessions.active.current | sessions | |
tomcat.sessions.active.max | sessions | |
tomcat.sessions.alive.max | milliseconds | |
tomcat.sessions.created | sessions | |
tomcat.sessions.expired | sessions | |
tomcat.sessions.rejected | sessions |
Feedback
Was this page helpful?
Thank you for letting us know!
Sorry to hear that. Please tell us how we can improve.
Last modified July 30, 2021: (75b5b8f)