Reduce production incidents by more than 50%

IT organizations are challenged with keeping applications and infrastructure environments running with zero downtime.  However, the ever-growing amount of data they collect daily and the expanding complexity of these environments makes it difficult.  How do you maximize the value of all the logs and metrics to ensure 99.999% uptime?  Collecting and analyzing all of this data and sending it to a central location is both time consuming and expensive. 

The office of the CTO at Dell service sees InsightFinder as an integral part of their overall strategy. InsightFinder provides a powerful AI engine that provides a unique incident prediction capability along with online anomaly detection and root cause analysis through real-time continuous log and metric streaming ingestions.  Dell teamed up with InsightFinder to tackle this problem with the concept of an industry-first distributed unsupervised machine learning approach. Instead of aggregating the massive monitoring data, especially tens of terabytes of log data from different data centers and clusters into one data repository, InsightFinder AI platform supports edge learning and anomaly detection using a light-weight edge-brain with very limited resources. Only anomalous data are extracted and forwarded to the main-brain for full-stack causal analysis. The main-brain aggregates predictive patterns extracted from all data centers into a powerful predictive AI knowledgebase, which can perform incident predictions for the global customer system. The distributed AI framework has been proposed by the research group led by InsightFinder founder Dr. Helen Gu in this paper in 2020.

Now Dell has drastically reduced the amount of monitoring data across their network while reaping the benefits of incident prediction. As a large organization, Dell has multiple data center environments for many projects because they host applications from around the world. To extract data insights within each environment, Insightfinder deployed edge AI engines in various clusters to send the extracted insights (e.g., anomalous log entries) back to their core engine for complete analysis. By leveraging distributed edge learning, InsightFinder was able to reduce the amount of data sent across networks by over 99%, resulting in a much more efficient operation while still protecting critical information.

With InsightFinder, companies can unlock the power of their data and get more out of it than ever before. The proven distributed learning technology ensures businesses benefit in three important ways:

  • Cost efficiency – by only forwarding anomalous events for analysis;
  • Scalability – through distributed networks which amplify processing capabilities to achieve greater levels of insights
  • Global intelligence – identifying patterns and causality from large datasets swiftly with unprecedented accuracy results not achieved through conventional AI approaches.

The combination makes proactive incident prevention an easy reality, revolutionizing how companies can scale. To test out a free trial for InsightFinder, sign up here



Other Resources

Our unified Kubernetes collector gathers metrics, logs, traces, and events in real-time from a single aggregation point. KubeInsight leverages all

Observe your entire IT system health in real-time with one central view across all services, applications, and infrastructure. Catch production

Deploy our purpose-built AI platform to empower you and your teams with hours of advance notice. See how it works

Unified Intelligence Engine™ is the system that drives InsightFinder anomaly detection, root cause analysis, and incident prediction. It ingests and

A major credit card company’s mobile payment service experienced severe performance degradation on a Friday afternoon.