Root Cause Analysis Report

Root Cause Analysis Report

In science and engineeringroot cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. It is widely used in IT operations, Crime Incident, Disaster, telecommunicationsindustrial process controlaccident analysis (e.g., in aviationrail transport, or nuclear plants), medicine (for medical diagnosis), healthcare industry (e.g., for epidemiology), etc.

It can result in more effective control of hazards, improved process reliability, increased revenues, decreased production costs, lower maintenance costs

 

RCA can be decomposed into four steps:

  • Identify and describe the Problem clearly.
  • Establish a timeline from the normal situation up to the time the problem occurred.
  • Distinguish between the root cause and other causal factors (e.g., using event correlation).
  • Establish a causal graph between the root cause and the problem.

RCA generally serves as input to a remediation process whereby corrective actions are taken to prevent the problem from reoccurring. The name of this process varies from one application domain to another.

 

In science and engineering, there are essentially two ways of repairing faults and solving problems.

Reactive management consists in reacting quickly after the problem occurs, by treating the symptoms. This type of management is implemented by reactive systems, self-adaptive systems, self-organized systems, and complex adaptive systems.

The goal here is to react quickly and alleviate the effects of the problem as soon as possible.

Proactive management, conversely, consists in preventing problems from occurring. Many techniques can be used for this purpose, ranging from good practices in design to analyzing in detail problems that have already occurred, and taking actions to make sure they never reoccur. Speed is not as important here as the accuracy and precision of the diagnosis. The focus is on addressing the real cause of the problem rather than its effects.

Root-cause analysis is often used in proactive management to identify the root cause of a problem, that is, the factor that was the main cause of that problem.

It is customary to refer to the root cause in singular form, but one or several factors may in fact constitute the root cause(s) of the problem under study.

A factor is considered the root cause of a problem if removing it prevents the problem from recurring. A causal factor, conversely, is one that affects an event's outcome, but is not the root cause. Although removing a causal factor can benefit an outcome, it does not prevent its recurrence with certainty.

Examples

Imagine an investigation into a machine that stopped because it overloaded and the fuse blew.[6] Investigation shows that the machine overloaded because it had a bearing that wasn't being sufficiently lubricated. The investigation proceeds further and finds that the automatic lubrication mechanism had a pump which was not pumping sufficiently, hence the lack of lubrication. Investigation of the pump shows that it has a worn shaft. Investigation of why the shaft was worn discovers that there isn't an adequate mechanism to prevent metal scrap getting into the pump. This enabled scrap to get into the pump, and damage it.

The apparent root cause of the problem is therefore that metal scrap can contaminate the lubrication system. Fixing this problem ought to prevent the whole sequence of events recurring. The real root cause could be a design issue if there is no filter to prevent the metal scrap getting into the system. Or if it has a filter that was blocked due to lack of routine inspection, then the real root cause is a maintenance issue.

Compare this with an investigation that does not find the root cause: replacing the fuse, the bearing, or the lubrication pump will probably allow the machine to go back into operation for a while. But there is a risk that the problem will simply recur, until the root cause is dealt with.

 

Root-cause analysis is used in many application domains.

Manufacturing and industrial process control

The example above illustrates how RCA can be used in manufacturing. RCA is also routinely used in industrial process control, e.g. to control the production of chemicals (quality control).

RCA is also used for failure analysis in engineering and maintenance.

IT and telecommunications

Root-cause analysis is frequently used in IT and telecommunications to detect the root causes of serious problems. For example, in the ITIL service management framework, the goal of incident management is to resume a faulty IT service as soon as possible (reactive management), whereas problem management deals with solving recurring problems for good by addressing their root causes (proactive management).

Another example is the computer security incident management process, where root-cause analysis is often used to investigate security breaches.

RCA is also used in conjunction with business activity monitoring and complex event processing to analyze faults in business processes.

 

Health and safety

In the domains of health and safety, RCA is routinely used in medicine (diagnosis), epidemiology (e.g., to identify the source of an infectious disease), environmental science (e.g., to analyze environmental disasters), accident analysis (aviation and rail industry), and occupational safety and health.

Systems analysis

RCA is also used in change managementrisk management, and systems analysis.