IT Problem Management Process

Abstract

Problem Management is a critical IT Service Management (ITSM) discipline aimed at minimizing the impact of incidents by identifying and remedying root causes, and preventing incident recurrence. This interactive document outlines objectives, flows, roles, and best practices—blending reactive fixes with proactive prevention.

1. Overview

Objectives & Scope

Objectives

Benefits

Scope

All hardware, software, services, and processes supported by the IT Department.

2. Definitions

TermDefinition
IncidentAn unplanned interruption or quality reduction of an IT service.
ProblemThe unknown underlying cause of one or more Incidents.
Known ErrorA diagnosed Problem with a documented workaround or solution.
Root CauseThe fundamental defect triggering an Incident or Problem.

3. Roles & Responsibilities

Problem Manager
Support Teams / Assignees

4. Process Activities

Problem Control (Reactive)
  1. Assessment: Review incidents; confirm no existing fix.
  2. Registration: Create & classify Problem record.
  3. Meetings: Convene stakeholders for Severity 1 & 2.
  4. Investigation: Validate workarounds; perform RCA.
  5. Workaround: Communicate interim fix; monitor recurrence.
Error Control (Permanent)
  1. Error Assessment: Identify permanent fixes; liaise with vendors.
  2. Solution Proposal: Document in Problem record.
  3. Change Management: Raise RFC or implement out-of-scope fixes.
  4. Deployment: Apply solution; update Known Error.
  5. Review: Validate success; close Problem.
Proactive Prevention & Trend Analysis

Process Flow Diagram

[Incident Logged]
       ↓
[Problem Assessment] → (Existing Fix?) → [Implement Fix] → [Close Incident]
       ↓ No
[Register Problem]
       ↓
[Investigation] → [Interim Workaround] → [Notify Service Desk]
       ↓
[Root Cause Identified]
       ↓
[Error Assessment] → [Raise RFC] → [Apply Permanent Fix]
       ↓
[Post-Implementation Review]
       ↓
[Problem Resolved]
  

5. Implementation Considerations

6. Benefits & Challenges

BenefitsChallenges
Fewer repeat incidents Process adoption resistance
Lower firefighting costs Up-front tooling & training investment
Enhanced support capabilities Consistent documentation discipline
Data-driven improvement Balancing speed vs. thorough RCA

7. Recommendations

  1. Start small with critical services; scale quickly.
  2. Embed an RCA culture—reward root-cause discoveries.
  3. Automate trend analysis via dashboards.
  4. Host quarterly Major Problem Reviews.

8. Conclusion

A structured Problem Management process converts reactive firefighting into strategic resilience. By coupling swift workarounds with permanent fixes and proactive trend analysis, organizations enhance availability, reduce costs, and drive continuous improvement.

9. References

  1. Office of Government Commerce. ITIL® Service Operation. TSO, 2011.
  2. ISO/IEC 20000-1:2018 – Information technology — Service management systems.
  3. van Bon, J. A. Foundations of IT Service Management based on ITIL®, Van Haren Publishing, 2007.