Problem Management is a critical IT Service Management (ITSM) discipline aimed at minimizing the impact of incidents by identifying and remedying root causes, and preventing incident recurrence. This interactive document outlines objectives, flows, roles, and best practices—blending reactive fixes with proactive prevention.
1. Overview
Objectives & Scope
Objectives
Reactive & Proactive Management via trend analysis
Reduce repeat incidents through permanent solutions
Enhance Service Desk first-line resolution
Continuous improvement via Major Problem Reviews
Benefits
Increased service availability
Higher productivity for IT and business teams
Lower costs from fewer temporary fixes
Centralized Known Error knowledge
Scope
All hardware, software, services, and processes supported by the IT Department.
2. Definitions
Term
Definition
Incident
An unplanned interruption or quality reduction of an IT service.
Problem
The unknown underlying cause of one or more Incidents.
Known Error
A diagnosed Problem with a documented workaround or solution.
Root Cause
The fundamental defect triggering an Incident or Problem.
3. Roles & Responsibilities
Problem Manager
Owns the Problem Management lifecycle
Coordinates resources, escalations, and meetings
Generates management reports and trend analyses
Support Teams / Assignees
Identify and investigate Problems
Conduct Root Cause Analysis (RCA)
Maintain Known Error database and raise RFCs
4. Process Activities
Problem Control (Reactive)
Assessment: Review incidents; confirm no existing fix.
Registration: Create & classify Problem record.
Meetings: Convene stakeholders for Severity 1 & 2.
Maintain centralized Known Error & workaround library
Establish SLAs, escalation paths, and review cadences
6. Benefits & Challenges
Benefits
Challenges
Fewer repeat incidents
Process adoption resistance
Lower firefighting costs
Up-front tooling & training investment
Enhanced support capabilities
Consistent documentation discipline
Data-driven improvement
Balancing speed vs. thorough RCA
7. Recommendations
Start small with critical services; scale quickly.
Embed an RCA culture—reward root-cause discoveries.
Automate trend analysis via dashboards.
Host quarterly Major Problem Reviews.
8. Conclusion
A structured Problem Management process converts reactive firefighting into strategic resilience. By coupling swift workarounds with permanent fixes and proactive trend analysis, organizations enhance availability, reduce costs, and drive continuous improvement.
9. References
Office of Government Commerce. ITIL® Service Operation. TSO, 2011.
ISO/IEC 20000-1:2018 – Information technology — Service management systems.
van Bon, J. A. Foundations of IT Service Management based on ITIL®, Van Haren Publishing, 2007.