IT Problem Management Process

Abstract

Problem Management is a critical IT Service Management (ITSM) discipline aimed at minimizing the impact of incidents by identifying and remedying root causes, and preventing incident recurrence. This interactive document outlines objectives, flows, roles, and best practices—blending reactive fixes with proactive prevention.

1. Overview

Objectives & Scope

Objectives

Reactive & Proactive Management via trend analysis
Reduce repeat incidents through permanent solutions
Enhance Service Desk first-line resolution
Continuous improvement via Major Problem Reviews

Benefits

Increased service availability
Higher productivity for IT and business teams
Lower costs from fewer temporary fixes
Centralized Known Error knowledge

Scope

All hardware, software, services, and processes supported by the IT Department.

2. Definitions

Term	Definition
Incident	An unplanned interruption or quality reduction of an IT service.
Problem	The unknown underlying cause of one or more Incidents.
Known Error	A diagnosed Problem with a documented workaround or solution.
Root Cause	The fundamental defect triggering an Incident or Problem.

3. Roles & Responsibilities

Problem Manager

Owns the Problem Management lifecycle
Coordinates resources, escalations, and meetings
Generates management reports and trend analyses

Support Teams / Assignees

Identify and investigate Problems
Conduct Root Cause Analysis (RCA)
Maintain Known Error database and raise RFCs

4. Process Activities

Problem Control (Reactive)

Assessment: Review incidents; confirm no existing fix.
Registration: Create & classify Problem record.
Meetings: Convene stakeholders for Severity 1 & 2.
Investigation: Validate workarounds; perform RCA.
Workaround: Communicate interim fix; monitor recurrence.

Error Control (Permanent)

Error Assessment: Identify permanent fixes; liaise with vendors.
Solution Proposal: Document in Problem record.
Change Management: Raise RFC or implement out-of-scope fixes.
Deployment: Apply solution; update Known Error.
Review: Validate success; close Problem.

Proactive Prevention & Trend Analysis

Monthly trend reviews of incident/problem data
Preventive actions: capacity planning, patching, architecture reviews

Process Flow Diagram

[Incident Logged]
       ↓
[Problem Assessment] → (Existing Fix?) → [Implement Fix] → [Close Incident]
       ↓ No
[Register Problem]
       ↓
[Investigation] → [Interim Workaround] → [Notify Service Desk]
       ↓
[Root Cause Identified]
       ↓
[Error Assessment] → [Raise RFC] → [Apply Permanent Fix]
       ↓
[Post-Implementation Review]
       ↓
[Problem Resolved]

6. Benefits & Challenges

Benefits	Challenges
Fewer repeat incidents	Process adoption resistance
Lower firefighting costs	Up-front tooling & training investment
Enhanced support capabilities	Consistent documentation discipline
Data-driven improvement	Balancing speed vs. thorough RCA

8. Conclusion

A structured Problem Management process converts reactive firefighting into strategic resilience. By coupling swift workarounds with permanent fixes and proactive trend analysis, organizations enhance availability, reduce costs, and drive continuous improvement.

9. References

Office of Government Commerce. ITIL® Service Operation. TSO, 2011.
ISO/IEC 20000-1:2018 – Information technology — Service management systems.
van Bon, J. A. Foundations of IT Service Management based on ITIL®, Van Haren Publishing, 2007.