Datadog monitoring and alert for k8s
-
kubernetes_state.deployment.replicas_available should not be too far off from kubernetes_state.deployment.replicas_desired
-
Keep a timeseries on # of running pods by node or replica set, and correlate it with resource metrics
-
For cpu, memory usage, prefer standard metrics from docker, rather than from k8s. Similar princple applies to time-series data - but later on, you can break the data down by pod, and then filter by k8s labels (K8s labels are already applied to Docker metrics)
-
Focus on monitoring sum of requests on hosts, instead of simple CPU and memory usage
-
Group/filter your docker metrics by k8s labels instead of hosts