Monitoring transition

Manual - Automated - Distributed

Manual monitoring

  • A demo setup overstayed its welcome
  • Problems

Automated monitoring

  • Never send a human to do a machine's job
  • Problems

Distributed monitoring

  • Multiple checkers, distributed checks
  • Horizontal scalability


  • Manual - icinga 1.x (nagios compatible)
  • Automated and distributed - icinga2


Audit manual setup

Build a parallel automated setup

Progressive cutover

Distributed setup

Monitor the monitor

Resources

Consolidate checks

  • Package scattered checks and their configs
  • Demo

Monitoring APIs

  • Include a /healthstatus target
  • Uniform URIs is a small ask
  • HTTP status codes are a feature
  • Use service discovery
  • Don't write boilerplate for json - Demo