Monitoring API Connect

 

Monitoring Metrics

We’ve standardised on Grafana for viewing metric data via dashboards and sending alerts to PagerDuty and Slack - here’s a rough list of the different types of data we’re monitoring. The majority of these are collected via collectd and stored in graphite, although some of the dashboards will use data directly from our Elasticsearch logging infrastructure.

System Metrics

Standard system metrics collected by collectd from the individual VMs and push to graphite

Product Metrics

API Invocation

Analytics

Ingestion of Analytics Events
Health of Analytics Cluster

API Manager

Informix

Overall Product usage

Internal REST endpoints used to regularly generate reports for data such as

Externals

Key to data sources:

Log Analysis

We use ELK as our centralised logging infrastructure with all of our systems offloading logs via syslog.

The logs are parsed and indexed by logstash on the way into the cluster. Everything gets indexed, and some patterns are identified to raise PagerDuty alerts.

Some examples of patterns we’re using to alert on include: