Reference

This section of the documentation describes facts about our servers and deployments.

Monitoring

UptimeRobot

Website downtime is monitored by UptimeRobot, which notifies sysadmin email addresses at OCP and ODS. Keyword monitors are used where possible.

Sentry

Application errors are reported to Sentry, which notifies individual email addresses. Not all services report errors to Sentry.

Prometheus

Servers are monitored by Prometheus. Salt is used to configure Prometheus monitoring on each server.

We use the following exporters:

  • Node Exporter is installed on each server to export hardware and OS metrics like disk space used, memory used, etc.
  • Black Box Exporter is installed on the Prometheus server to check that services are up. (Keyword monitors are more complicated to configure than on UptimeRobot, and so are not used.)

Salt sets up a Prometheus server to collect metrics from these servers.

For access details, check the configuration file pillar/private/prometheus_pillar.sls. You can find OCP’s verson of this here.

To access the monitoring service, go to the URL in the server_fqdn variable. The username is prom and the password is in the server_password variable.

To access the alerting service, go to the URL in the alertmanager_fqdn variable. The username is prom and the password is in the alertmanager_password variable.

Currently, Open Data Services runs a Prometheus server to process client data, which raises alarms to ODS staff only (#31).

Hosting

OCP uses:

Communicating during downtime

For services managed by Open Data Services, please see the protocol for planned and unplanned downtime.