Managing Diagnostics

This document explains how you manage diagnostics. Diagnostics provide a tool to obtain information about how often equipment or objects can fail and what corrective action that can be taken to put it right.

Diagnostics can also be used to link the preventive task to the likely failure. This procedure makes it easier to calculate the cost of prevention versus the cost of failure.

M3 Maintenance supports a diagnostics type called Reliability Centered Maintenance (RCM).

Outcome

Diagnostics information can be used to develop better maintenance routines.

Diagnostics provides you with these information:

  • How critical a piece of equipment is. The equipment criticality indicates the equipment priority.
  • An error code structure defined per equipment/position. The error code structure can be designed to describe different failures and what causes them. The error code structure can also contain both the preventive action to avoid the failure in the future, as well as a corrective action to repair the failure when it has occurred.
  • The cost of prevention versus the cost of repair is calculated.

Fully implemented, diagnostics (or RCM) can be expected to result in the following:

Better safety

  • Safety failures are detected before operational ones
  • Strategies to avoid safety failures are developed

Better operating performance

  • Focus is directed towards maintenance of critical equipment
  • Maintenance response time is shortened as a result of improved diagnostic information
  • Reduced overhaul frequency
  • Unreliable components can be replaced
  • Less Burn-in problems

Reduced maintenance costs

  • The prevention of expensive failures and secondary damage leads to a reduction in maintenance costs
  • Well defined and efficient routine maintenance procedures
  • Improved plant knowledge resulting in reduced need for expensive experts
  • Clearer policies for use of stand-by plant
  • Acquiring new technology

Better information

  • Allows quicker evaluation of maintenance needs and policies for changing circumstances
  • Provides a better basis on which to plan the work and reduce backlog
  • Provides a good foundation for diagnostic manuals, systems, and training

Personnel improvement

  • Greater operator knowledge and awareness
  • Improved skills
  • Improved inter-personnel relationships
  • Improved cross-departmental contacts.

Before you start

  • Serialized item/equipment records must exist in 'Equipment/Serialized Item. Open' (MMS240).
  • Position records must exist in 'Model/Site. Connect Position' (MOS440) if diagnostics is to be created against positions.
  • Items must be defined in 'Item. Open' (MMS001) if diagnostics is to be created against items.
  • Error code 1 must be defined in 'Error Code 1. Open' (MOS572).
  • Error code 2 must be defined in 'Error Code 2. Open' (MOS568).
  • Error code 3 must be defined in 'Error Code 3. Open' (MOS569).
  • Parameter 18 in 'Settings - WO Operation Reporting' (MOS991) and 'Settings - Operation Reporting/Facility' (MOS990) should be set accordingly.

Purpose

Diagnostics is a continuous follow-up and improvement of maintenance routines for equipment. Normally, a team of RCM specialists meet at regular intervals to evaluate the most effective ways to maintain the plant and the equipment. They evaluate selected equipment in several steps, and at the end of the process, the suggest maintenance strategies to avoid future failures. The different steps in this process are described below.

Diagnostics is based on the act of answering these questions about the equipment:

  • What is the system or equipment asked to do?
  • What functional failures are likely to occur?
  • What are likely consequences of these functional failures?
  • What can be done to prevent these functional failures?

Diagnostics can be used to:

  • Evaluate the criticality of positions and equipment/components in order to identify where the focus of effort should be directed.
  • Define the main failure characteristics of critical items at the plant.
  • List the symptoms, probable causes and likely failure effects.
  • Analyze toe consequence of failure.
  • Identify and recommend remedial action.
  • Reduce the impact of a breakdown by allowing fast diagnostics of the faults.
  • Simplify the technical knowledge and experience required to identify fault by building a failure knowledge database.
  • Link the failure database to the preventive maintenance tasks that are designed to avoid failures. Doing this allows the organization to close the loop between identification of failures and the task designed to avoid it.
  • Link the failure to the repair service, in the event that the failure still occurs.
  • Send messages when a failure that should have been avoided by a preventive service still occurs.
  • Provide a basis for calculating cost of failure, mean time between failure (MTBF), total cost of failure, cost of repair and the cost of failure versus the cost of repair.

Diagnostics structure

The diagnostics structure is created per equipment/component or per position in the site or model individual structure.

The diagnostics can be defined to describe failures in different ways. An error may for example be broken down into smaller details. A total failure can for example be divided into its main causes. A conveyor may have a main failure called Total failure. This can be due to a power failure, a motor failure, a drive chain failure etc. Each of these can then be further described, along with preventive and corrective actions that can be taken to solve the problem.

The equipment/component diagnostics structure is created in the following programs:

  • 'Error Code 1. Connect to Equipment' (MMS238)
  • 'Error Code 2. Connect to Equipment' (MMS237)
  • 'Error Code 3. Connect to Equipment' (MMS234)

The position diagnostics structure is created in the following programs:

  • 'Error Code 1. Connect to Position' (MOS449)
  • 'Error Code 2. Connect to Position' (MOS451)
  • 'Error Code 3. Connect to Position' (MOS458)

An error code structure can be automatically created based on error codes specified at operation reporting. The new error codes reflect actual failures and form a starting point for diagnostics work. The new error codes are created in 'Error Code 1. Connect to Equipment' (MMS238).

Criticality selection

Criticality selection is done to decide how critical an equipment is. Diagnostics using RCM is useful for equipment that is critical for maintaining the production. Normally, only critical equipment is included, as the method requires a large amount of effort.

M3 supports criticality selection with a user defined criticality system. Within this system, each equipment can be evaluated according to a user definable grading system. M3 will suggest a criticality rating that should be applied to the equipment after the evaluation is done.

Basic criticality information is defined in the following programs:

  • 'Criticality Factor. Open' (MCS150).
  • 'Critical Factor. Connect Critical Levels' (MCS151).
  • 'Additional Critical Factor. Open' (MCS160).
  • 'Additional Criticality Factor. Connect Levels' (MCS161).
  • 'Criticality Class. Open' (MOS044).

Functional failures

The failures can be described as an equipment failing to perform its function. An equipment function for a pump can for example be described as:

To feed at least 1,500 liters of water per minute at a pressure of no less than 10 bar from the cold water feed tank to the header tank.

Functional failures can be described as the different ways in which the equipment can fail to do what it is supposed to do.

  • Total failure
  • Failure to feed 1,500 liters/minute
  • Failure to feed at 10 bar
  • Failure to feed water between the cold water feed tank and the header tank
  • Partial transfer between the cold water feed tank and the header tank

Each function can then be broken down into failure modes and effects. The failure modes describe the possible causes of the failure. In the above example these might be:

  • Power failure
  • Contactor failure
  • Motor failure
  • Pump bearing failure

Failure effects can be described as the full effect of each failure. These effects can take into consideration the effects on production, plant and people. The failure effect for the contactor failure above could for example be:

The pump stops, an alarm goes off in the control room. After 12 minutes the header tank low-level alarm sound. After this, the tank empties in 20 minutes. Time to reset trip and restart pump is 5 minutes.

The graphic below describes how functional failures can be described in an error code structure:

Consequence analysis

The consequence analysis consists of a number of questions that help you to decide what action to take either to correct the problem or to avoid the problem in the future.

A standard set of questions is delivered with M3. The questions are defined within 'Document Text. Open' (CRS940). The standard questions can be modified in 'Consequence Analysis. Open' (MCS120). This program controls what questions are to appear as a result of the answer to the previous question. The standard questions are designed to provide assistance when deciding on the preventive or corrective action to take. The questions take into consideration whether the failure is obvious, constitute a safety issue, is likely to cause a production stop etc.

The consequence analysis questions in 'Consequence Analysis. Display' (MCS130) can be reached when you set up the error code structure for equipment or positions.

The question structure is indicated by the diagram below:

Preventive action selection

When you have decided how critical an equipment is, based on the criticality rating, and how significant the failure is based on the consequence analysis, it should be possible to design an appropriate service to avoid the failure, or to reduce its effects to an acceptable level. The service can be linked to the equipment in 'Error Code 2. Connect to Equipment' (MMS237).

It is also possible to create and connect a corrective service to the equipment. This service will be used in 'Work Request. Quick entry' (MOS185) to repair the equipment if the failure should occur.

The service is defined in 'Service. Open' (MOS300).

Cost analysis/approval

The cost of the failure should be compared to the cost of preventing the failure using the preventive service over the same period. It is possible to accept a higher cost to perform a preventive task than the failure itself, if for example safety or environmental issues can be avoided.

The cost for prevention should be compared to the repair cost by running 'Product Costing. Calculate Service' (PCS235). Once the service costing is run, it is possible to compare the corrective and preventive cost in 'Error Code 2. Connect to Equipment' (MMS237).

The cost of failure is based on the downtime in (MMS237) times the downtime cost specified in 'Equipment/Serialized Item. Open' (MMS240) plus the repair cost in (MMS237). If a corrective service is specified in (MMS237), the system will calculate its cost based on the product structure. The downtime can also be automatically calculated by running 'Maint Stats. Create' (MCS300), provided that parameter 19 in 'Settings - Maintenance Statistics' (MCS390) is selected.

The cost of prevention is calculated from the cost of the service (labor, materials etc) over the same period as the MTBF (mean time between failure). If a failure has a MTBF of three years and the preventive task has a frequency of 6 months, the system will calculate how many times the service would be carried out in the three-year period and then multiple the cost of each service by the number of times it would be carried out. The MTBF can be calculated automatically when 'Maint Stats. Create' (MCS300) is run, provided that parameter 19 in 'Settings - Maintenance Statistics' (MCS390) is selected.