How To: A Deeper Dive Into Service Level Management: Monitoring Service Health
Paramount to delivering great customer service is monitoring the health of IT by services. The Provance ITSM Service Health dashboard lets you not only monitor incidents, requests, changes, events and problems by a service but it lets you compare your service performance against pre-set baselines ensuring you are always on top of what’s happening in your environment—or in your customers’ environment if you’re an MSP. Provance ITSM provides the most basic items and options for setting various baselines, so you can easily benchmark your service performance. And when you realize you’ve got a problem, Provance ITSM makes it easy for you to take action on what you’re seeing such as drilling down to see only those specific incidents within a service, or cherry-picking the relevent incidents and creating a problem.
Let’s take a closer look at the Provance ITSM Service Health dashboard. Before I start telling you about what the data tells me (in the screenshot below), let’s take a tour of all the charts that come out-of-the-box (of course, being on the Dynamics 365 platform, Provance ITSM is configurable and you can always add your own charts to this dashboard).
First of all, as you can see, all incidents are shown by service. The incidents are all represented by the color blue, while the baselines are represented by either the color orange or the color red (depending on the chart). The baselines are, of course, whatever you want them to be, which can be dependent on your SLAs. However, you do want your baselines to act as a warning system, so at a quick glance, if there are less service disruptions than X, then you can assume the service is running fine, but if your blue bars are longer or breaks whatever baseline you have set then you realize immediately that you might have a problem and you’d better go and investigate.
In our out-of-the-box Service dashboard the three charts on the right—with the orange bars—are all time-based. The top middle chart represents the last hour, the top right chart represents a day, and the bottom chart represents the last 30 days. The chart on the left shows all your open active incidents regardless of when they were created. Ideally, you will often have a blank chart in the middle or on the right representing no incidents in the past hour or day, and if you do, the chart would look like this:
Now, let’s take a closer look at what my data is telling me in this specific scenario. If I look at the chart on the left – Open Incidents by Service Baseline—
I can see that we’re having a hard time closing incidents, in particular, with our Network. So, I decide to take a closer look and I click on the grid icon (in the top right corner). Provance ITSM brings up a pop-up which lists all the incidents, including the 81 on our Network service. I want to look at the Network specifically, so I click on the Network Services link within one of the Network related incidents and it brings up a Network Service dashboard Analytics version (see screenshot below):
So here you can quickly see, in the chart on the left, the incidents created over time and on the right a chart containing all the different ticket types. And if you scroll down (see screenshot below) you have the charts that are related to time. The top left chart shows open incidents. The top right chart shows raised incidents. The bottom left chart shows incidents raised today and the bottom right chart shows the last 30 days.
So, in this case, it actually does look like we have a problem. The problem isn’t the number of service distruptions that are coming in, because as you can see (incidents are blue) they are less than we expect based on the baselines that we put in. It all looks fine. But if you look at the top left chart, the blue bar is longer and so we have a big issue in getting those incidents closed because they’re still open. Thus, we’re getting less incidents then we expect for a low functioning server, but were just not getting them closed. So that is the issue in this case.
As I mentioned at the start of the blog post, Provance ITSM makes it easy for you to drill down but it also makes it easy for you to create a problem from incidents. So, now that we’ve established in this scenario that we have a problem, I’ll show you how we’d create a problem from here. By clicking on the chevron next to Network in the top-level navigation, I’ll choose Incidents from the drop-down navigation. This pulls up a dashboard of all the associated Network incidents. I then highlight all the relevent incidents, click Edit, and Provance ITSM brings up a pop-up whereby I can either associate the incidents to an existing problem or create a new problem.
One of the greatest benefits here of Provance ITSM and the Dynamics 365 platform is that you can set up these same analytics for an account, contact, service, and an SLA, since they’re all associated directly to an Incident. So, if you’re an MSP, you can follow this same process for setting up baselines and metrics on a per customer basis rather than on a service basis. Or if you’re a large organization with a multitude of services, you could set this up on a per business unit basis.