Root Cause Analysis

Root Cause Analysis (RCA) gives users a view about anomalies’ root causes. All anomalies displayed on RCA are fetched from global view data, and root causes to anomalies are fetched from its best- matching causal graph and correlation graph. Users can configure action on anomalies in order for these patterns’ self-healing. (Going away)

RCA works as follows: works on the system level, from the detected incident it traces back to its root cause anomaly events. When a user selects an anomaly pattern, backend receives related pattern information and filter conditions from UI. Then backend uses pattern’s project information and request time range to find best-matching causal graph. Once it finds the causal graph, it first targets this pattern in the causal graph, then starts from this pattern to find root causes by traversing graph. The RCA trace can be at most three nodes, and is acyclic. Finally root cause results will be rendered to the user based on filter conditions. Currently RCA only applies on detected incidents, and log/metrics are disabled now.