# Cluster Monitoring Operator

The Cluster Monitoring Operator manages and updates the Prometheus-based monitoring stack deployed on top of OpenShift.

It contains the following components:

* [Prometheus Operator](https://github.com/coreos/prometheus-operator)
* [Prometheus](https://github.com/prometheus/prometheus)
* [Alertmanager](https://github.com/prometheus/alertmanager) cluster for cluster and application level alerting
* [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
* [node_exporter](https://github.com/prometheus/node_exporter)

The deployed Prometheus Operator is meant to be leveraged by users to easily deploy new Prometheus setup for their application monitoring.
The Prometheus instance (`prometheus-k8s`) is responsible for monitoring and alerting on cluster and OpenShift components. It should not be extended to monitor user applications.
Alertmanager is a cluster-global component for handling alerts generated by all Prometheus instances deployed in that cluster.

Metrics are collected from the following components

* [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
* [node_exporter](https://github.com/prometheus/node_exporter)
* Kubelets
* API server
* Prometheus (just `prometheus-k8s` for now)
* Alertmanager

**Important:** The Prometheus Operator managed by the Cluster Monitoring Operator will by default only look for `ServiceMonitor` resources in namespaces containing an `openshift.io/cluster-monitoring` label (with any value).

## Contributing new component integrations

The Cluster Monitoring Operator has many builtin `ServiceMonitor` resources which enable discovering the metrics endpoints of a variety of well-known components. Only components that must be created before the cluster monitoring stack belong in this repository, in order to solve the cyclic dependencies of bootstrapping.

To register a new builtin component, make the following changes:

* Add a new `ServiceMonitor` manifest file to [jsonnet/prometheus.jsonnet](jsonnet/prometheus.jsonnet). An example of this can be seen for the OpenShift component "kube-controllers", [here](https://github.com/openshift/cluster-monitoring-operator/blob/01bfe3789117e7074e893251f2f6d31c816db8fb/jsonnet/prometheus.jsonnet#L113-L145).
* Re-generate the go-bindata code, using the `pkg/manifests/bindata.go` make target. This will also create a new file in `assets/prometheus-k8s/` according to the name given in the jsonnet code.
* Add a constant in [pkg/manifests/manifests.go](pkg/manifests/manifests.go) which points to the new manifest file, from `assets/`.
* Add a new `Factory` method in [pkg/manifests/manifests.go](pkg/manifests/manifests.go) which loads the manifest using the new constant.
* Add a step to `PrometheusTask` in [pkg/tasks/prometheus.go](pkg/tasks/prometheus.go) which creates the `ServiceMonitor` using the `Factory` new method.

To add new builtin recording or alerting rules:

* Add a new [Prometheus rules file](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) to [jsonnet/rules.jsonnet](jsonnet/rules.jsonnet).

Run `make pkg/manifests/bindata.go` after you modify the files and make sure to add the modified files to the commit. All rules are automatically created, so no additional code changes are necessary.

## Roadmap

* Monitor etcd
* Adapt Tectonic inherited alerts with OpenShift operational knowledge

## Testing

### End-to-end tests

Run e2e-tests with `make e2e-test`.
Clean up after e2e-tests with `make e2e-clean`
