Login

News

The Future Of AI-First IT Operations And How To Design Your Monitoring Strategy

Parenting toddlers is not for the faint of heart. If we intervened every time…

Helen Gu

  • 8 Sep 2020
  • 1 min read

Parenting toddlers is not for the faint of heart. If we intervened every time danger seemed imminent, we’d first insert them in Lysol-coated, hermetic plastic bubbles. We’d react the same way to dirt-eating and kidnapping. Imagine the chaos that would cause at play dates.

Thankfully, we cultivate instincts by observing the world and understanding what pattern of events indicates real danger. For example, stranger + park + unattended toddler = problem. However, relatives + home + unattended toddler = (relative) safety. Hundreds of times a day, we do mental math as parents to determine a toddler danger (we’ll call it TD) score. When it exceeds a certain threshold, we act.

Web-scale systems need to be monitored like toddlers. One sampled metric indicating high CPU usage doesn’t necessarily indicate danger. However, a thousand anomalous metrics and log lines from hosts associated with the same service over a one-minute period probably indicates a need to act.

Keep reading on Forbes →

Contents

Explore InsightFinder AI

Take InsightFinder AI for a no-obligation test drive. We’ll provide you with a detailed report on your outages to uncover what could have been prevented.