Monitoring transition

Manual - Automated - Distributed

saurabh.hirani@gmail.com / @sphirani

Manual monitoring

  • A demo setup overstayed its welcome
  • Problems



Down arrow

Automated monitoring

  • Never send a human to do a machine's job
  • Problems



Down arrow

Distributed monitoring

  • Multiple checkers, distributed checks
  • Horizontal scalability

What?

  • Manual - icinga 1.x (nagios compatible)
  • Automated and distributed - icinga2

How?

Audit manual setup

Build a parallel automated setup

Progressive cutover

Distributed setup

Monitor the monitor

Thank you

Resources

saurabh.hirani@gmail.com / @sphirani

Consolidate checks

  • Package scattered checks and their configs
  • Demo

Monitoring APIs

  • Include a /healthstatus target
  • Uniform URIs is a small ask
  • HTTP status codes are a feature
  • Use service discovery
  • Don't write boilerplate for json - Demo