Purpose
To monitor application availability. The example will onboard application logs, check application status, detecting when anomalies occur that indicate application issues.
Data Source
Logs that include API return status, such as HAProxy logs or Nginx logs, can come from any integration or agent, such as AWS cloudwatch or elastic. The log format can be plain text or JSON.
Plain text example:
[24/May/2024:12:02:19 +0000] 24.199.98.33 52.91.69.202 "/autodiscove/" 404 0.000 -
JSON example:
{
“status”:200
“path”:/agent-upload-instancemetadata
“request_query”:
“upstream_status”:200
“upstream_name”:insightfinder-insightfinder-dataserver-443
“remote_addr”:
“remote_user”:
“bytes_sent”:2256
“request_time”:0.372
“vhost”:app.insightfinder.com
“request_proto”:HTTP/2.0
“request_length”:488
“duration”:0.372
“method”:POST
“http_referrer”:
“http_user_agent”:ReactorNetty/1.1.16
}
Project Set up
Log Ingestion
Check out our integration guide and set up any log project.
View Log Data
The data can be viewed on the InsightFinder Log/Trace analysis page. The data may take a few minutes to begin streaming and showing up in the UI.
Configure Alerting
- Alerting can be configured based on certain keywords appearing in the logs such as HTTP Status “500” “400”. This can also include the detection of custom messages such as “application is not responding”.
- Regular expressions are supported to define the keywords in logs, these are then keyword alerts.
- To configure go to System setting → project setting → Advanced setting → Labels → Detection Keywords
View Alerts
When alerts are generated they can be viewed in the incident investigation page. Choose the Log Anomalies tab.
Configure SLI (Service Level Indicator)
- To Configure go to, System settings → project setting → Advanced setting → Log To Metric setting
- Count all status messages. Take the count of different status logs, and transform it into a metric. This can also be used to detect increases or decreases in status log counts.
- Count all successful HTTP status messages. Any delta will be unsuccessful requests that indicate a drop in service level. Regex is used to filter only the correct HTTP statuses.
- The operation performed is division which divides the actual value (all successful) by the base value (all status messages)
View Availability Chart
- Now you can see the availability (SLI) in the top left chart. More details on the status codes are available in the other charts.
Configure Incident Detection
- Set error code metrics to enable 0 filling to auto detect anomalies when error code increases
- Set as a KPI, escalate to incident when the KPI duration is exceeded.
- Set “Near constant detection” for minor deviations from the baseline
- Now the system will automatically detect anomalies.
- Note: A user can set a specific SLA value if they desire.
View Incident
The Incident generated and associated anomaly can be viewed and analyzed in the Incident Investigation page