An unsupervised pattern extraction system and method for extracting user interested patterns from various kinds of data such as system-level metric values, system call traces, and semi-structured or free form text log data and performing holistic root cause analysis for distributed systems. The distributed system includes a plurality of computer machines or smart devices. The system consists of both real time data collection and analytics functions. The analytics functions automatically extract event patterns and recognize recurrent events in real time by analyzing collected data streams from different sources. A root cause analysis component analyzes the extracted events and identifies both correlation and causality relationships among different components to pinpoint root cause of a system anomaly. Furthermore, an anomaly impact prediction component estimates the impact scope of the detected anomaly and raises early alarms about impending service outages or application performance degradations based on our identified correlation and causality relationships.