Blogs

InsightFinder – Challenges and opportunities for the future: AIOps Executive Roundtable

Erin McMahon

15 Aug 2021
36 min read

Execs from Fortune 1000 companies gathered to discuss the challenges and opportunities associated with AI in IT. Read as experts with backgrounds from firms like ServiceNow, Okta, McDonalds, and J.B. Hunt, reflect on the current state of technology, and the most important priorities to think about when building for the future. For a free outage analysis from InsightFinder to see how we can help your business, sign up here.

A video recording of this conversation is available here.

Dan Turchin:

This is Dan Turchin. At the end of October, I sat down with a group of executives to discuss the future of AI for IT operations. We discussed all of the things you’d expect, geeky stuff, anomaly detection and automated root cause analysis, but it was also a bit of a therapy session. We discussed what makes the CIO’s job so hard and why automation’s the best way to satisfy the insatiable thirst for tech services to run the business, and the unrealistic expectations users have for uptime and performance. At InsightFinder, we’re committed to helping enterprise CIOs thrive, where most fail.

We believe complicated new service architectures and machine data got us into this mess and the best AI tuned to deliver a better end-user experience will get us out of it. Without further ado, listen and learn from some of the best in the business.

Mark Settle:

Mark Settle, born in upstate New York in Rome, New York. I left Okta about a year or so ago. I’ve been advising startup companies and frankly having probably too much fun over the last couple of months. I’ve had an opportunity to work with several VCs as well, and so that’s my role. I’m playing an advisory role at the moment, getting to do a fair amount of talking and writing about topics of interest to me as well. Biggest ops challenge. Over the course, I’d say the last six months, I’ve taken a deep dive into privacy data management and automation tools. I’m trying to put together an automation council next year to look at this whole phenomena of automation disillusionment.

I think people spend a lot of money chasing a lot of things that just don’t meet the heightened expectations that people have for that technology, but for next year, there’s two topics I’m going to focus on, and we can follow up maybe one-on-one with me and anybody else on the call. I’m really interested in the next generation of DLP technologies and that whole SASE thing that’s going on, but I think there needs to be more fundamental rethinking of how we think about data loss prevention.

These low code development platforms, I find really intriguing because I think the pendulum has swung so far to SAS, that maybe it’s time for it to swing back and you’ll see application teams turning out bespoke applications to perform a specific process, or to support the activities of a specific kind of a work group. Two research projects that I’m teeing up for next year around DLP and no code stuff, low code stuff. There you go.

Dan Turchin:

Wow, that’s admirable. All right, let’s go around the horn. Sean, you want to go next?

Sean Barker:

Sure. I’m Sean Barker. I was born in Southern California and lived most of my life there. I currently am the CEO of cloudEQ, which is a consulting and professional services organization that does all cloud-based infrastructure, ops application, development, management, and managed services. We got onshore and offshore facilities in order to keep the cost down, and we’re working with a number of fortune 500, fortune 100 companies. It keeps us fairly busy. The biggest option ops challenge that we face is wading through the sheer number of tools and services and SAS products and things that are available out there to… We’ve locked in on several that we like, but tomorrow there’s going to be five more that come out, that also make breakfast and coffee and cost less, and going down that path.

It’s just getting the right timing for the right service, the right tools, the right processes to enable the customers that it’s probably one of our biggest challenges.

Dan Turchin:

Yeah. Good, good to have you Sean. Ray, you want to go next?

Ray Lippig:

Sure. My name is Ray Lippig. I was born in a suburb of Chicago, so a midwestern fellow and raised there till I was 18, then went to the West Coast for 20 years and now back to the Arkansas area for, I guess it’s about 22 years now. Worked at J.B. Hunt, that small trucking company in Northwest Arkansas and have the privilege of being the program manager for what we call digital workplace and IT operations. We work hard at end user sorts of stuff. Things like ITSM tools and AIOP tools are important to us as the company’s been growing substantially over the last several years and looks to grow a whole lot more in the years to come. We are trying to position ourselves by getting some of those cutting edge tools, and that’s how we actually found Dan and InsightFinder, and so looking at some of his tools as well.

Dan Turchin:

Joel, you want to go next?

Joel Eagle:

That might not be a good thing, but anyway. Yeah, name is Joel Eagle. Birthplace is Fort Sill, Oklahoma. I lived there for 60 days and then when the military gave my mom the okay, joined my dad and in Stuttgart Germany, we met him over there. He was deployed there. My dad’s military. That was my birthplace, never been back, never seen Fort Sill, Oklahoma. It’s right outside of Lawton, Oklahoma as I understand it. The company I work for is McDonald’s corporation. As a career-wise, I came to McDonald’s from a company on the West Coast Southern California, worked there for a number of years. From there, I came from a company in the southeast, a large healthcare company in the US as I spent a lot of time doing that, but I ended up at McDonald’s through a number of different connections.

Now I’ve been here and living in the Chicago, or actually now living in Chicago for gosh, I think I made that… oh, I made that move in 2012. Sorry, my brain’s got to get caught up. Largely, the role I played today, I was going to say it’s something witty that I had something about sleeping on a holiday in express last night, but I didn’t know somebody in politics on the call, but anyway, the role I play today I think is a cheap cat herder, but I’m responsible for the cloud platforms, so Azure AWS. We have a little bit of GCP, not much, but a little bit, as well as our ITSM platform. We call it service café. Internally, it’s ServiceNow built up with a couple of the capabilities that we have. We bolt together, we refer to that as service café.

I also own and responsible for our data movement, so you think of EAI and a number of other type platforms that we use to move data all around the world, so think about transaction level data that moves to and from the restaurant, sales data, POS updates, patching, all that stuff that happens around the restaurants. Let me see, what else do I do? Database support, all the database team people that they support there. Trying not to leave anybody out, I feel like I’m doing an Oscar thing. In that space, that’s largely my remit there, so we do that. I report to a global CTO by the way. Biggest ops challenge. If I think about it, the way the McDonald’s is set up, we’re certainly a global brand, but we don’t operate completely like a global company.

We operate a bit like a multinational, and we have a corporate structure where there’s the organization. We talk about having suppliers which are separate companies obviously that supply stuff to us, but very integral to what we can deliver, everything from paper to food, et cetera. Then there’s the franchisees, right? It’s the corporation, the franchisees, and the suppliers. We refer to that group as the system, and it’s a bit of a challenge to get the system to benefit in most things that we do.

You take a different approach to thinking about suppliers to you, or who you serve, whom your customers are, et cetera, but if you just think about the challenges from an IT perspective of supporting that environment and supporting whether it’s data from that or whether you’re talking about level one help desk, calls, or anything that you have going on to support this, all while expected to reduce costs, improve efficiency, move at speed, and do something meaningful, and help the company be more improve our competitive advantage, automation comes into play, right? Everybody keeps asking me what are you doing about AIML.

Sometimes want to stop and ask him, “Do you actually know what that means, right? Can you tell me what AIML really means, the acronym or you just learn the acronyms, now we moved on?” I struggle with the expectations. I think I’m hearing a little bit of that on the call earlier here around what exactly can this do for us, where can we apply it, where we make investments, how do you manage expectations? There’s no end to the number of things that we can think of, or challenges that we need to overcome. I think for me, the biggest challenge is when I think about doing things at scale, complexity seems to be the enemy of scalability, right? The more I’m required to scale something, I’m forced to reduce the complexity.

Reducing the complexity means, I’m at this gap where I don’t understand the automation AI technology enough to pull the complexity out. I’m still reliant on a number of what I’ll call them legacy type operational models, whether it’s level one, two, three four support, or typical ways that we think about treating data, stewardship, and things like that. I know that’s a big bunch of things, but for me, it would be how do I achieve scalable results and manage complexity, right? Then all the rest of the stuff that goes with that, security, et cetera. Quarantine hobby, letting my beard get shabby. I was able to get out to the racetrack a little bit and drive some. I like doing that.

I often tell people when I’m in a car going really fast in a racetrack, I don’t think about budgets or performance reviews, or any other stuff that we have to deal with on the regular part of our job. Every now and then, I think about some automation stuff. I think about how the machine that around me isn’t working, but for the most part, I just think about surviving and living through the corner, and coming out the other end and doing it well, so that’s my hobby.

Dan Turchin:

The metaphor for life Joel.

Joel Eagle:

Yeah.

Dan Turchin:

I thought I’d share my thoughts on where AI technology is, what at least I see ahead and call it the next 12 to 15 months, and then some perspectives about challenges that we’re all having, just about how to be successful, not with the buzzwords, but operationally, how to implement, measure, budget for, that sort of thing when it comes to AI first technologies. A little history tour. Like Mark, I like to open the time capsule sometimes. I’m not doing British prime ministers, but I like to be a student of technology. I’m particularly fascinated by the Luddite movement in the 18th century. That’s my connection to Britain, but when it comes to AI, so roll back the clock to the ’50s, there was a computing society conference in Hanover, New Hampshire at Dartmouth College.

This young handsome professor named Marvin Minsky, who you see there in the top right proposed this wild idea that there would be a time when tasks that we think can only be done by humans could actually done by machines. It was pretty radical and fast forward to the ’70s, it became commercialized. That’s the early days of what we talked about as the first AI summer, massive infrastructure and academic investment in what was then called artificial intelligence. Term was coined in 1956, but expert systems were basically deterministic workflows. That’s what we call now, that would solve problems that filling out forms or stuff that’s very rudimentary today, but the idea with that with some conditional logic, basically if then statements, you could build interfaces.

Before, there was really certainly no desktop computing, but with mainframes and punch cards and that sort of thing, you could build some conditional logic and make machines appear to be thinking. Fast forward to the ’80s that the technology had become so over hyped, a common theme that spending dried up the idea that these systems were going to become sentient anytime soon, yeah, it was so far-fetched that everyone backed away from commercializing a lot of these technologies. In the ’90s, the renewed enthusiasm of AI I would argue was punctuated by the IBM supercomputer beating Kasparov, actually in only one of three games.

To his credit, Kasparov did win two out of three, but nonetheless, that sent a message that something that was always thought to be strategy based and something that a machine could never be “smart enough” or understand enough patterns to play chess, well that was an indication that maybe this AI thing was going to work out after all. In the last decade or so, a gentleman named Jeffrey Hinton doing work I think at University of Toronto, essentially commercialized neural networks. Now we talked about deep learning, but less about the technology, but more about some of the amazing things that neural nets or deep learning can do.

We don’t bat an eyelash anymore when we get accurate recommendations from Netflix, or Amazon recommends what we need, or the Gmail type of head feature it recommends what you’re probably going to want to type. These are all things that are aided by neural nets that really was the output from research that was done not that long ago. I cite Jeff Hinton, but there’s certainly a small, but a meaningful cadre of professors that have been part of that. Now, if you look at what’s up ahead, maybe not the next decade, but call it the next 30 plus years, maybe 40, maybe 50, depends on who you talk to, we’re going to move from the point where we’re now at which is what we call narrow artificial intelligence.

In a very specific domain, AI can perform well, for example, routing trouble tickets, or for example determining the root cause of an outage, very narrow. Whether it’s supervised using labeled data or unsupervised inferring patterns, that’s the state of technology, but in terms of evolutionarily speaking, the “smartest AI” is probably about the intelligence of a 9-month-old. You’re not going to have your 9-month-old swinging a bat against Randy Johnson, and you’re not going to have your 9-month-old doing tests in college. They’re nine months old, there’s certain things they do well.

In the next 30 to 50 years, you’ll see what we call AGI so artificial general intelligence and that’s where just across a broad set of things that you’d associate with human intuition, judgment will actually be able to develop systems that can do a lot of those. There are a lot of interesting early experiments being done with robotics, trained on neural nets and things like that, but they’re still very simple today, versus what you’re going to see. I’m a firm believer that at no time in the near future, probably not the next century will we encounter a bot apocalypse because I firmly believe that humans that develop these algorithms ultimately have the right intentions when it comes to developing these technologies.

I think there are going to be more and more checks and balances up ahead, and I think it’ll be a good thing, whether it’s for healthcare or education or defense or agriculture, eliminating famine. I think these are the problems that are actually going to be solved sooner than the bots are going to become our overlords. A little bit of soapboxing, but that’s where we’ve been, I think where we’re headed. We talked about the current state. Now applying AI to where we’re at today in ITOps, it’s a complex relationship that we as practitioners have with technology. At one point, maybe a decade, 15 years ago, the technologies were a lot simpler and we finally started to get gain control over how to monitor them.

Even though the monitoring systems were very simple, the applications were very simple too. Simple heartbeat monitors, things like that were sufficient. Well, now what happened is the pace of technology complexity is so far outpacing our ability to monitor and manage these systems, that all of a sudden, ITOps went from looking like heroes and being able to keep up with all the digital exhaust from these systems, and now they’re just woefully inadequate. Everyone’s scrambling to figure out in the context of microservices and CI/CD architectures and dozens of API calls to external systems. How do we make sense of all of the new failure points that have been introduced into these systems, and so that’s where you start to get to this need for a more intelligent approach to managing systems at scale.

I’m convinced we’re getting there. I’m maybe a little biased, maybe a little pollyanna-ish, but I think if we look at each element of the service health lifecycle, they’re really interesting new technologies, AI first technologies that are being introduced to improve the ability of mere mortals to manage these complex systems. If you look at monitoring, so increasingly smart anomaly detection is being used to suppress noise. You have a manageable number of actionable alerts, event management, so being able to actually do anomaly detection across different types of data.

We could never do that before, but now we can look across traces and events and logs and metrics and et cetera, and be able to actually make some sense using typically unsupervised machine learning, because there’s too much data to label it, but using unsupervised machine learning across all these types of data sources make some sense out of what’s going on, which makes incident management a lot more practical. We can actually use AI to predict when the next incident is going to happen, and what’s the most likely cause of it.

When we look at remediation, we may not be at the point where you want the machine or the AI to auto remediate because there’s a lot of risk and there are a lot of different patterns that can occur, but we’re getting closer to the point where the input from the AI, the machine fused with the input from the human, in an ops context, you’re going to be able to see pretty accurate root cause analysis just based on a little bit of judgment and rational thinking from humans fused with the ability of AI to be able to distill the universe of possible root causes down to maybe two or three. Then lastly, continuous learning.

That’s really the strong suit of AI is being able to detect when a pattern associated with historical outages is about to recur the learning from the past experience, who was involved, what action did they take, how long did the outage last. These three or four variables build those into this learning model and all of a sudden, before monitoring detects an issue, this continuous feedback and continuous learning loop kicks in and says, “Hey, here’s something you could do proactively to prevent an outage that’s otherwise going to happen.” I told you I’d do a little bit of prognosticating before we jump into the round table.

I see three key patterns and talk a little bit about the evolution of AI, but three things that I see happening even in the next 12 months, and feel free to call me out a year from now if I get the timing wrong, but three things that I feel like are imminent. Not on the order of a decade, but closer to on the order of a year to a couple years. I think in the workplace, what we now think of as intelligence is going to become ambient. You walk into a conference room. Eventually, we will be back in conference rooms, and it detects who you are, so what temperature you like the room to be at.

If you book a conference room and there are more people invited, then can be accommodated by the room, then you’ll get an intelligent notification saying you need a bigger room and here’s one that’s available. Now we talk about things like contact tracing. There’s going to be a lot of AI technology applied to making workplaces seem smart, because they’re going to know where it’s okay to gather and where it’s not okay to gather. One big problem that everyone is going to face as they return to work is if a building’s only allowed to be a certain capacity, how do you stage who can be in the building when and literally, how do you schedule the elevators so that we’re not waiting an hour because everyone’s trying to go up in a crowded elevator at the same time?

These are problems that are going to get solved by lots of data and interesting algorithms. It’s going to seem like just a smart workplace, but it’s all supported by data and AI. A big issue that everyone in the AI first industry is facing is how do you train AI algorithms without training them to be biased, and this is everyone’s problem. It’s probably bigger than we realize. I can give you an example.

The world that I’ve been in ITOps and IT service management, if a model is getting trained on let’s say which vendor is the most effective at fixing a problem and in the training data, most of the data is represented by call it large global brand name vendors, just for simple equipment maintenance, well what you’re going to do is tell the model inadvertently, that large co is more effective at fixing these problems than maybe the small local minority owned business. No one intended for that algorithm to have a bias against minority owned business, but if you train the data on what’s actually out there, the large companies are more represented in the data.

We’re going to start to see universal regulation frameworks for how you judge the quality of AI with respect to simple examples like that, and that applies to AI making decisions about who goes behind bars, how much risk is this sentence for this convicted criminal, things like that it. If we don’t create these regulatory frameworks, you’re going to see there is the potential for dangerous decisions to get made, and for those decisions to reinforce subsequent AI first interactions, but as a society, I think we’re going to get really good at judging AI. Then lastly, I think it was Mark alluded to low code environments.

In my humble estimation, I see in the next 12 to 18 months, low code environment’s winning in terms of we talked about the cliché of the citizen developer, but I think what we have traditionally called coding is going to become a commodity. Because the low code systems are getting so efficient and they’re so adept at making everyone essentially a programmer, I think that the future of what we’ve called programming is really the quality of your thinking with regard to machine learning and data science. I think the ML developer is the new programmer, and I see that happening rapidly. Again, I’m biased. I spent four years of ServiceNow.

It’s a very popular low code platform, but seeing the power of what you can do today with no C, no python, typically not even any JavaScript and playing that out 12 to 18 months, it’s pretty clear that at no time in the near term future will the skills required to develop and train machine learning algorithms be outsourced to machines to write, or train the algorithms on their own. I think that’s going to become the new point of demarcation when it comes to evaluating technical skills, and also what the market’s willing to pay for. These skills are in high demand short supply. Okay, so I told you a little bit of me soapboxing.

I do want to use the balance of the time which is too little to have a discussion, and I thought I’d start with your perspectives on you talked a little bit about what the technology can do, but you know what, your employees don’t care about what the technology can do. They just care about does it or does it not solve a problem they have. We’d love to hear your perspectives on if you’re using the technology today how, and how do you think about what are the right business problems to solve with AI.

Joel Eagle:

Dan, this is Joel. I’ll go and we started down the path with you a while back doing the auto routing of trouble tickets based on level one help desk, people interacting with the help desk. We started to improve our ability to understand what we’re doing there, auto route tickets. The beginning of that was all about, “Hey, we can reduce costs, improve efficiency,” but ultimately, it’s about improving that customer experience right around what we’re doing. Today, I have this bent that I’m on. I call it no ops, right?

I’m trying to get to the point where even though I’m responsible for what would be a lot of infrastructure area with cloud, the interaction to it particularly from our legacy environments, and I’ll say I’ll call it back office and legacy that we put into largely the Azure environment, it’s the people who operate those and the contracts they have for suppliers, and that still reflect the world of the data center, right? As you know, McDonald’s is in fact by the end of this year, at all of our compute will be cloud-based with the exception of an AS400 that we have to worry about at a mainframe, but everything else is in the cloud, right? I’m going to shoot the mainframe, it’s the last thing I do.

It’ll be dead by the end of next year, but anyway the point is, for me, it comes back to I need to improve this self-service capability to the point where either I’m using low code type applications where I can build self-service, right? I can say I need to cobble together from technology at capability, that capability then needs to be consumable. In order to do that, I’ve got to have some amount of whether you want to call it artificial or autonomous, or some way of handling variability in the way somebody wants to engage with that, right?

The part of the example I like to use is when I talk to people particularly internal to the cut, to the organization, I asked them they minute they want me to do something or change something or put some technology in, or build some portal or build capability for them, it’s about organizational change management, it’s about training. It’s about all these things that you have to go off and do and explain the complexity of all of it. I’m saying, “How do I get to the point where I’m just an app on your phone, and you like the app because it does something for you. You download it and you use it, and you don’t actually call up American or United or Delta to figure out how to use the app.

You don’t call your bank to figure how to use the app. You just download it and you use it because you want to.” This is to me right now, if I could get to that, that’s my holy grail of the whatever, the terminator, whatever, the bots, right? It’s like we could just put capability out there on a platform that has the ability to then start learning and understanding how to overcome the exceptions, to make the exceptions a way I could automate the handling of exceptions. This accelerates then the ability to provide that level of I’ll call it ease, or customer experience. That’s where I’m trying to get to, and I have no clue how to do it at the moment.

Dan Turchin:

Good example.

Sean Barker:

I agree with Joel on the no ops portion. I think we’re looking at less arms, right? We’re [crosstalk 00:35:40].

Joel Eagle:

Sounds like an ops guy talking.

Sean Barker:

Yeah, I mean most of the engagements that we’re doing today are really focused on how do we automate everything and make life work through a CI/CD pipeline and do devops, and really enhance the ability to go fast and remediate issues as they come about within application development, or what have you. I think as we move towards low code apps, I think that we’ll get better.

Dan, I think we’re a little further out mostly because of the transition time that it’s going to take to do the change management, to do the adoption, to do the shift from what customers and companies have today over to a low code option, but many of the companies or engagements we have are not only doing automation, but they’re looking a bit more forward towards AI and ML, and how do we take all this automation that we’ve done, and now auto remediate problems within the ops environment. What are the right algorithms, and it’s going to be in the same way we’ve always done things to say, “What’s the low hanging fruit, and how do we eliminate those things first?”

I do think we will struggle with bias, because we’ll structure things off… everything starts out structured off what you know and then biases develop, but I think over the long run, I think we can get there. To Joel’s point, if it long term becomes an app that you just know how to use and do it because you want to, that makes it that much better for the customer or the client or the constituents within that client. Makes sense?

Dan Turchin:

Yeah. Ray, I’d love to hear your perspective. We’ve known each other a long time, but I know your team was very intentional about picking a business problem. How was that discussion internally?

Ray Lippig:

The company’s been growing over the last several years, and we have older technology in place that needs to be upgraded, so that as we continue to grow, the technologies that we have in place can do some of the things that you were actually talking about earlier, and that is be able to identify a problem when you have multiple tools in place that are coming up, monitoring things, but coming up with alerts and well, which one is the biggest issue right now and how do we get to root cause fast. We have some of these systems in place, Dynatrace, [inaudible 00:38:32] and SolarWinds. They have lots of alerts at times.

What do you do quickly to address it, so that it’s the maximum efficiency and effectiveness for the company, while the company’s growing both in capacity, capability, but also people, which means more alerts and more things going on. Then we’ve obviously even think through how do you measure that as well. We’ve been looking at metrics and revamping and rethinking what’s the best metrics that we need to look at. That has been morphing and changing, we hope improving, but it’s all connected. It’s a bit of a challenge. How to be more efficient and effective at what we’re doing is the bottom line in the operations world. I oversee seven different towers, all end user related.

We care a whole lot about how the operations work, but need to be more efficient to add it. What tools ITSM wise, what AIOps tools, which ones work best together is a platform better than Pointools. Those are questions we have been looking at very carefully over the last a couple of years, but very intentionally over the last three or four months, and trying to come to some conclusions even at that.

Dan Turchin:

Mark…

Ray Lippig:

I hope that answers your question Dan because…

Dan Turchin:

No, no, good perspective. That’s why I wanted you to chime in on that. Mark, you interact probably with more CIOs than anyone. These kind of challenges that you guys have been discussing, are those the ones that you typically hear around making the business more successful, or what do you think is keeping CIOs at night these days?

Mark Settle:

I mean on the outside, most people are migrating to the cloud. Many are using dual provider solutions. They have maybe one set of apps up on Azure and another set on AWS. Flexera published interesting report. There’s very little cross use of management tools between different clouds. When people are supporting hybrid cloud environments, your team has got to develop some sophistication around multiple tools.

Maybe some of the technologies we’re talking about here could smooth over some of those differences, or provide some kind of a more common front end in terms of triaging, et cetera, but the ease of spinning up infrastructure on demand is so great these days like you alluded to before, the complexity that everybody about they’re spending more money than they expected in the cloud and it’s because they really don’t know how to like turn stuff off and use it effectively. There’s that issue, and then there’s the unbridled data replication that occurs with all of the security and privacy problems that that can lead to as well, and enforcing regulatory controls.

Automated procedures, one of the great things about automated procedures is they provide the kind of evidence that auditors are looking for in many cases that controls have been upheld as well. If you turn a whole bunch of software guys loose and say, “Here’s your AWS sandbox, get your toys out and go have fun,” you think things can get out of control pretty quickly. I think there’s a lot of concern in that area, but just as a bottom line, I mean any transactional process, whether it’s security alerts or server alerts or storage volumes or network traffic or whatever, I think lends itself to the use of AI tools to provide more quality assurance.

Dan Turchin:

Joel, you brought up at the beginning there’s not really a shared definition of what AI and ML are. Be curious to get everyone’s perspective on what the business is expecting? What does AI mean? What does the business think IT and ITOps is doing with AI, and then the reverse of that question, do you think those expectations are reasonable?

Ray Lippig:

I can certainly speak from our side when it comes to the business, AIOps means like this magic poof dust that you are able to sprinkle onto operations and all of a sudden, everything is fixed fast. That would be a perspective from some. Those that look at it a little bit closer realize that is not reality. The other side of it is some things that like you said, this is going to be a long time coming. I even spend too much time, effort, energy, and money investing in it, until it’s better down the road. We have two spectrums that we deal with.

Sean Barker:

Yeah.

Joel Eagle:

Yeah or… Oh, go ahead Sean. No, go ahead.

Sean Barker:

I’m with you, definitely, the poof dust that makes everything better I think is the expectation. I think there’s still a great deal of effort that needs to take place and investment that needs to take place in making AI really efficient or companies. I mean you can go collect all the telemetry data, and you have the what right, and then you need to work through the so what through all that data and slim it down to what can you do with it, which becomes the now what, right? As you look to gather the data and get it to get smarter about what’s going on and how to find these things, and which things to suppress and which not to suppress, there’s time, effort, energy, and money to invest there.

It’s not going to be magic proof dust for quite some time, but I think it’s going to continue to get better. As we go forward, that level of investment and understanding and maturity in the overall AI and ML world will really help the organizations long term, but I don’t think it’s going to solve the problem immediately today in an instant, which is I think the expectation of those who are writing checks, right?

Ray Lippig:

It is Sean. In fact, that’s number five on the round table discussion of managing expectations is one of the things I find myself doing fairly often with our execs, because there is a desire for us to obviously be effective and efficient, and not spend too much money, but do it fast. There’s all these things going on that you’re trying to say, “Well, we want to do all of that, but we have to be wise about how we do it, and how we think about it.”

Joel Eagle:

Yeah. For us, I think it’s there’s the expectation is that because you’ve got a number of the industry hype, because of the number of people out there that are pushing everything from log aggregation and machine learning around what it can do, the outcome is this, right? They think we are applying AI or ML type technologies to reduce the MTTR, or the length of an outage, overall the impact of an outage. We should be using it to do that, learning each time. We should be able to use AI and ML to predict when an outage is going to happen, so prevention.

Outage prediction and prevention is somewhere where there’s an expect… I mean I hear this all the time and in terms of the expectations, what are you doing around this because every single time somebody like, I don’t know, Mackenzie, Lloyd, Accenture, anybody else that gets my CIO’s attention for tent for five seconds tells them all the stuff you could be doing, and millions can be saved. I mean the management leadership here, they’re not idiots, right? They’re not stupid, but they just flip the question over to you and ask you, “Hey, I heard about this, what are we doing here? I heard about this, what are we doing here?”

It could be because the CEO’s asking them that as well, but there is the expectation that it’s out there in the marketplace, everybody’s familiar with the hype cycle, but there’s enough I’ll call it momentum out there with enough companies making money doing this, that they believe it’s not just snake oil, that we should be able to do something demonstrable with it. We should have I like the one and two, or at least with two with the pick and the right success metric. I think the expectation now is that I should be able to have some metric that I can point to, whether it’s MTTR or P1 or whatever and show where I’m applying some type of automation to affect that metric. That is harder to do.

There’s proving to be harder to do than most people expect, or most people understand and that expectation is hard to manage.

Mark Settle:

Well, you know what, I hate to be too contrary, but in some ways, I think a lot of times people outside IT, I mean they really don’t care what the technology is. I was on a discussion earlier this week about executive perceptions about devops tools, devops techniques and are we sophisticated enough. If you’re building something, frankly everybody else outside of the engineering function, they don’t really give a flip. They just want to know like we’re getting features out faster than the competitors, or we’re able to blunt some kind of a new startup by mimicking their capabilities in the first three months they were competing with. Those kind of business questions are the way they look at things.

The MTTR, well if there aren’t any anomalies in the way and there aren’t any significant anomalies that reach their level of attention, how would I tease taking care of all that stuff in the background? Most of the time, they’re not technology advocates. Now, the CIO’s going to maybe ask some questions, but I don’t know if the head of sales or the head of HR or the head of marketing is going to be wondering what we’re doing over here.

Dan Turchin:

Yeah, yeah, I agree. Well, again I’m sensitive to time. I know it feels like we’re just getting started and we could definitely go hours, but in the interest of having some next steps, I told you one of hopefully the biggest value you’ll get out of this session is meeting each other. I’ll circulate your contact information and happy to share these slides if you’re interested. Then, yeah, just want to make sure we’ve at least seeded some conversations that you can have offline, so you can be good resources for each other. One of my favorite quotes I think it’s more true now than ever, “The pace of innovation has never been as fast as it was yesterday, or as slow as it will be tomorrow.”

To me, that in one pithy statement summarizes where we’re at. I’m quite bullish on the future and with that said, we’re at the top of the hour. Really just appreciate you making time for us and hopefully, you get value out of this, not just from this short hour, but from conversations that it turns into.

For a free outage analysis from InsightFinder to see how we can help your business, sign up here.

Contents

Erin McMahon

Published: 15 Aug 2021
36 min read

InsightFinder's open source data integrations jumpstart AIOps analysis.

Blogs

Unveiling the Latest Open Source Capabilities for InsightFinder: A Leap Forward in AIOps Analysis

InsightFinder has extended its data collection capabilities by incorporating some of the most prominent…

Blogs

To achieve zero downtime, seamlessly manage the entire lifecycle of an incident

The lifecycle of an operational incident has multiple phases, each of which is often…

Blogs