|

Click when you are finished.
Welcome to the Event Management learning. Click each sequential tab below to find key learnings and information.
{magictabs}
Introduction ::
What is an Event?
An event is a change of state which has significance for the management of a Configuration Item or IT Service. It is anything that might happen in the environment that we need to know about and probably act
upon to ensure the continuity of an agreed level of service.
Depending on the nature of the event, the required action may be repairing a failed component or simply capturing its information on a log for later trend analysis.
In addition to managing events that can affect the environment, Event Management is also an area where automation opportunities may be discovered. These opportunities should be identified and prioritized since many may represent important sources of efficiency and simplification in the normal operations. This can lead to faster and more accurate support results.
Goals
Event Management is intended to
- Detect, analyze, and determine the appropriate process for dealing with event occurrences.
- Automatically detect conditions in the operating environment that might cause service delivery to fall outside of agreed quality levels, and to identify opportunities to automatically correct those conditions.
Benefits
Event Management processes serve as an initial link into other service operations, transition and design processes, and it can initiate integrating the management of IT services. Effective event management provides:
Early detection of incidents
- Ensure prompt action is taken to keep service quality at its agreed level.
Optimization of systems monitoring investments
- Apply a rational, planned approach on what needs to be monitored based on expected return on investment. This ensures all critical and
relevant events are identified and managed while unnecessary monitoring and resource waste are avoided.
Effective use of resources
- Plan and implement the best course of action to manage each event.
Support automation
- Event Management opens the opportunity to apply self-healing scripts to automate how to handle events.
Improved service delivery & customer satisfaction
- With timely awareness when a service level agreement has been breached, corrective action can be quickly taken.
Terms & Definitions {showhide title="See More ..." changetitle="Hide ..."}
Throughout this learning, the following key terms, descriptions, and Event Management categories are used. Review the following for reference and applied use.
Terms
| Term |
Description |
| Alert |
- This term is often used instead of Event.
- Example: A warning that a threshold has been reached or a failure has occurred.
|
| Monitor |
- A device or mechanism (like a software application) that constantly observes or tracks the environment.
- To detect when specific conditions are met, which correspond to the events needed to be captured. When a condition is detected by the monitor, an event is produced and sent to other components to process as needed.
- Example: A monitor may constantly read the temperature in a computer room and report an event when (and only when) the temperature has surpassed a pre-defined threshold.
|
| Configuration Item (CI) |
- Any component that needs to be managed to deliver an IT Service.
- Components that work together to provide a service delivery.
- Specific conditions are captured as events and these conditions might refer to:
- Capacity: A disk is full
- Performance: A web application that is slow to respond
- Specific malfunctions: A communications line is down
- Examples: disk drivers | servers (virtual/physical) | data connection line | data base | personal computer | mobile device.
|
Event Categories
| Standard Event Category |
Description |
Example |
| Information |
Information Events simply provide information about conditions that are part of the normal operation.
They are captured to track operational statistics or to understand the context of the operations when other, more impactful events happen. |
- A job completed successfully, someone logged into a system.
- There is some congestion but still normal/acceptable.
These are typically logged and later used for statistical analysis to fine-tune the system behavior.
|
| Warning |
Warnings are used to prevent abnormal conditions to become an exception. |
- A communication line or disk space utilization above a defined threshold.
- More than 3 unsuccessful login attempts within 60 seconds.
- Batch job taking longer than X minutes to complete.
|
| Exception |
Exceptions point to an operational condition outside predefined parameters where a serious failure or impact on performance has occurred. This means the service is impacted and not delivering up to its agreed service levels. An intervention is necessary to restore the service to its normal operating conditions. |
- Disk failure causing lack of availability to an application.
- Lack of availability of the application itself (different events, one causing the other).
|
{/showhide}
Additional Resources
The following resources provide further learning opportunities and additional information:
||||
Roles & Responsibilities ::
Event Manager Role
Each Service Management process has one or more associated roles. A role is a category assigned to a user, or group of users, that defines access privileges to process functionality. In the Event Management process, there is only one role defined being the Event Manager.
| Click to view a demonstration how to View Roles Using ISMP. |
 |
Event Manager Responsibilities
The Event Manager responsibilities are grouped into 3 main categories: Build, Confirm, and Improve. In each main category group are additional scope items to help ensure event management success.
1. Build The Build group recognizes the Event Manager needs to:
- Understand what Events are needed for Event Manager's service. {showhide title="Show Description" changetitle="Hide Description"}
It is important to Understand what Events are needed for your service.
For new services or components, understand what:
- Configuration Items are contained in the service
- Event information is needed
- Failure conditions
- Detection or prevention mechanisms are available to minimize failed conditions
From the supplier recommended list of detectable Events, consider the following for which:
- Events do we need to capture and act on
- Are the right thresholds for our implementation of the service
{/showhide}
- Develop and maintain service impact models. {showhide title="Show Description" changetitle="Hide Description"}
It is important to Develop and maintain an Event Management service impact model. When developing a service impact model, the following should be executed:
- Identify how each service CI depends on other CIs for its normal operation.
- Capture any key dependencies that we want to follow when one of the CI fails or degrades.
- Work with the Configuration Manager for your service, to ensure all these key dependencies are recorded in the CMDB.
{/showhide}
- Document event management requirements and make a case for their importance. {showhide title="Show Description" changetitle="Hide Description"}
It is important to Document event management requirements.
When preparing and documenting an event summary, consider:
- What Condition or Event needs to be detected
- What type of Event
- What threshold should be used
- How Event should be handled
- What actions should be taken
- What support group should take actions
- Can response to this Event be automated
|
When describing why requirements have business importance, consider:
- Why important to handle this Event
- What service or indicator will this improve
- Size of improvement is expected on that indicator
- Show an actual figure, with support documentation
- Why is it important to the Intel Business
Be sure to submit the requirements to the EM PIT for approval and deployment submission.
|
{/showhide}
2. Confirm The Confirm group recognizes the Event Manager needs to:
- Perform audits to ensure the required Events are processed as needed. {showhide title="Show Description" changetitle="Hide Description"}
Making rules and defined processes more specific to Event Management helps create a more refined, automated process.
Plan for and conduct regular audits. Audits should test the processing of the Events including the following:
- Event Detection
- Event Disposition
- Event Notifications
- How support groups handle the Event
It is important to understand the requirements that should be met.
- Understand which tools are used to capture and process your service's Events, and how they work together.
- Access to tools or reports to complete the required tests and assessments.
{/showhide}
- Ensure Events are processed with good performance measured by Key Performance Indicators (KPIs). {showhide title="Show Description" changetitle="Hide Description"}
The following are critical for good performance success:
- Keep track of KPI trends for your service
- Identify and implement required improvement action plans.
{/showhide}
3. Improve The Improve group recognizes the Event Manager needs to:
- Identify opportunities to automate the handling of Events. {showhide title="Show Description" changetitle="Hide Description"}
When looking for automation opportunities, consider:
- Ideas that have a good benefit for the cost.
- Looking into your process stats and trends (LSS is most useful here) to identify areas of automation that were not evident at first sight.
Examples of automation of Events include self-healing scripts and service recovery options.
{/showhide}
- Participate in Event Management Process Improvement meetings {showhide title="Show Description" changetitle="Hide Description"}
It is important to participate in Event Management Process Improvement Team meetings to help identify areas of improvement such as how to:
- Identify common process improvements that can or should be made.
- Enable service operations to go faster, do better, and be more efficient.
{/showhide}
Responsibilities Summary {showhide title="See More ..." changetitle="Hide ..."}
The below image summarizes the 3 categories associated with the responsibilities for the Event Manager role.

{/showhide}
||||
Process Flow ::
Service Configuration Structure
Each Service is delivered through a collection of Configuration Items (CIs). Each event needs to refer to some CI to be meaningful. In addition, the Configuration Item is a key element to integrate an Event with other processes. All CIs that an Event needs to refer to also need to be represented in the Configuration Management Database (CMDB).
Representation in the CMDB is needed for Events to be meaningful, and to note how a malfunction of any concrete CI may affect other concrete CIs. This connects and integrates Event Management with other Service Management processes like:
- Incident
- Problem
- Knowledge
- Capacity
- Change
- Release Management
Configuration Management Database (CMDB) Diagram
A malfunction of any concrete CI may affect other concrete CIs. This is represented as relationships among the CIs which are part of the CMDB. These relationships are used to understand how an impact on one CI can propagate to other CIs, and how the service or related services may be affected.
The CMDB enables you to
- use Service impact models to understand how impact propagates through the service.
- recognize when relationships are fed into a correlation engine, calculating the propagated impact can be automated with different actions selected based on that impact.
Click on the image below for a larger display.
{modal href="images/ETD_Image_Folder/ITSM_Event_Management/CI-buildout-6.png"} {/modal}
Event Management Process Flow (Simplified Version) {showhide title="See More ..." changetitle="Hide ..."}
| Item |
Function |
| Sensors |
Take measurements from the operational environment. |
| Monitors |
Regularly test a measurement to detect when it has surpassed a threshold when this happens an event is produced. |
| Filters |
Reduce the number of events, especially when monitors are not easily configurable. |
| Categorization |
First determination of the event category type: Information, Warnings, Exceptions. |
| Correlation |
The significance of the Event is most accurately determined considering:
- Several events which happen within a close time proximity,
- The importance of the Configuration Items that the events refer to,
- How the event impact is propagated onto other CIs or services through existing relationships,
- The identification of the root cause for a number of events.
|
|
Correlation is the most important step in the process flow. It determines:
- How severe the event (incident) impact is
- What the root cause of a series of events is
- How the event should be handled
Correlation needs to consider:
- Which CIs are being impacted
- How impact is propagated to other CIs and services
Actions can be taken to:
- Notify interested parties
- Attempt an automatic fix, or seek a manual fix solution
For some Events such as an information issue, simply logging the Event for later use is sometimes all that is needed.
|
|
{/showhide}
Process Intersects {showhide title="See More ..." changetitle="Hide ..."}
The Configuration Item, CI relationships, SLA, and OLA information is fed into Event Management to calculate the extent that events impact the services. Outcomes are produced by Event Management for their use by several other service management processes as shown in the below chart.

{/showhide}
||||
Tools & Indicators ::
Monitoring Tools
For each IT area, there is a particular set of tools to detect and manage Events. As an Event Manager, you are required to understand the overall functionality of each of these tools.
Plan of Record Tools
The table below shows Event Management process areas (Infrastructure area) and which tools are used to perform the basic steps in the Event Management process.

Acronyms
| BEM |
BMC Software's Event Manager |
LAN |
Local Area Network |
| CHRM |
Client Health Remote Management |
MC_Notify |
Home-grown notification utility used with BEM |
| DCH |
Data Center Hosting |
MP |
Management Packs |
| EC |
Engineering Computing |
RUM |
Real User Monitoring |
| EHE |
Event Handling and Exclusion Tool |
SCOM |
Microsoft's System Center Operations Manager |
| iDSM |
Intel's Unix Distributed Systems Management tool |
SFF |
Small Form Factor |
| WAN |
Wide Area Network |
|
Key Performance Indicators (KPIs) {showhide title="See More ..." changetitle="Hide ..."}
| KPI |
Description |
| Event Volume |
Total number of Events generated by the monitored configuration items. It helps understand how much effort (or automation cost) is dedicated to the management of the IT services through events. |
| Non-Alerted Incidents |
Number of Incident records that were created from a source other than an Event, like from an end-user calling TAC. |
| Event TRS |
Average time that it takes to resolve incidents reported from Event Management. Measures how efficient IT is in solving issues that were foreseen to happen and detected as events. |
| Ticket/Event Ratio |
Relation of Incident tickets created per number of reported Events. It measures the efficiency of the automated management of events. |
| Event TAI |
Average duration between the detection of an Event and the creation of a corresponding Incident ticket (when one was created). This measures how fast the event management process is in reporting incidents that need to be resolved. |
Further details and KPI descriptions are available at the Event Management Key Process Indicators (KPI) site.
{/showhide}
Reports Control Charts & KPIs Dashboard {showhide title="See More ..." changetitle="Hide ..."}
Depending on your Event Manager role needs, the following tools are used to track trends and analyze Event Management data.
- Event Control Charts
| ► Shows the Event Management KPIs at the highest level. |
 |
- Event KPI Dashboard
| ► Shows the Event Management KPIs at a second level of detail. |
 |
- MyITE (My Integrated Tool Environment)
| ► It is an ITSMR provided portal for easy use of these reports. |
- OLAP cube, OLAP (On-line Analytical Processing)
| ► Data source used with Excel to create pivot tables fed by Event and Incident Information - useful for detailed analysis. |
 |
{/showhide}
||||
|| Completion || ::
When you have completed the learning in this content, click the below Track Completion button.
This allows the IT Service Management team to know that you have completed this learning and are prepared for relevant next steps.
Click when you are finished.
{/magictabs}
|