Blogs

Reduce production incidents by over 50% and monitoring network cost by 99%

Erin McMahon

30 Nov 2023
3 min read

Reduce production incidents by more than 50%

IT organizations are challenged with keeping applications and infrastructure environments running with zero downtime. However, the ever-growing amount of data they collect daily and the expanding complexity of these environments makes it difficult. How do you maximize the value of all the logs and metrics to ensure 99.999% uptime? Collecting and analyzing all of this data and sending it to a central location is both time consuming and expensive.

The office of the CTO at Dell service sees InsightFinder as an integral part of their overall strategy. InsightFinder provides a powerful AI engine that provides a unique incident prediction capability along with online anomaly detection and root cause analysis through real-time continuous log and metric streaming ingestions. Dell teamed up with InsightFinder to tackle this problem with the concept of an industry-first distributed unsupervised machine learning approach. Instead of aggregating the massive monitoring data, especially tens of terabytes of log data from different data centers and clusters into one data repository, InsightFinder AI platform supports edge learning and anomaly detection using a light-weight edge-brain with very limited resources. Only anomalous data are extracted and forwarded to the main-brain for full-stack causal analysis. The main-brain aggregates predictive patterns extracted from all data centers into a powerful predictive AI knowledgebase, which can perform incident predictions for the global customer system. The distributed AI framework has been proposed by the research group led by InsightFinder founder Dr. Helen Gu in this paper in 2020.

Now Dell has drastically reduced the amount of monitoring data across their network while reaping the benefits of incident prediction. As a large organization, Dell has multiple data center environments for many projects because they host applications from around the world. To extract data insights within each environment, Insightfinder deployed edge AI engines in various clusters to send the extracted insights (e.g., anomalous log entries) back to their core engine for complete analysis. By leveraging distributed edge learning, InsightFinder was able to reduce the amount of data sent across networks by over 99%, resulting in a much more efficient operation while still protecting critical information.

With InsightFinder, companies can unlock the power of their data and get more out of it than ever before. The proven distributed learning technology ensures businesses benefit in three important ways:

Cost efficiency – by only forwarding anomalous events for analysis;
Scalability – through distributed networks which amplify processing capabilities to achieve greater levels of insights
Global intelligence – identifying patterns and causality from large datasets swiftly with unprecedented accuracy results not achieved through conventional AI approaches.

The combination makes proactive incident prevention an easy reality, revolutionizing how companies can scale. To test out a free trial for InsightFinder, sign up here.

Contents

Erin McMahon

Published: 30 Nov 2023
3 min read

Blogs

Elevating Zabbix AI Monitoring with InsightFinder AI

AI Monitoring for Zabbix with InsightFinder IT Observability Zabbix, the renowned open-source monitoring software,…

Blogs

Predict and prevent incidents by connecting New Relic to InsightFinder

Turn New Relic’s market leading observability tool into an incident prediction engine by connecting…

Blogs

New InsightFinder innovations to reduce MTTR: Knowledge Base and Active Learning

TL; DR: InsightFinder Human Insight + Active Learning FTW Knowledge Base InsightFinder makes it…

See how InsightFinder helps your team deliver reliable services across every layer of the stack

Take InsightFinder AI for a no-obligation test drive. We’ll provide you with a detailed report on your outages to uncover what could have been prevented.

AI Reliability

IT Reliability

ARI

ARI Mobile

Unified Intelligence Engine - UIE

Integrations

Release Notes

Reduce production incidents by over 50% and monitoring network cost by 99%

Related Resources

Elevating Zabbix AI Monitoring with InsightFinder AI

Predict and prevent incidents by connecting New Relic to InsightFinder

New InsightFinder innovations to reduce MTTR: Knowledge Base and Active Learning

See how InsightFinder helps your team deliver reliable services across every layer of the stack