A detailed review of areas where you can expect to have specific responsibility and involvement
This unit provides you with more information on some important elements of a Process Safety Management System. If you already work in a Top-Tier chemical processing plant as a supervisor, operator or maintenance technician, you will very likely have specific responsibility and involvement in the four areas on which this unit is focused.
7.1
The key attributes of an effective incident investigation system are defined in Element 19: ‘Incident Reporting and Investigation’ of the ‘Energy Institute High Level Framework for Effective Process Safety Management’ as follows:
‘An essential aspect of HS&E and process safety performance improvement is learning from incidents and 'near hits' and taking appropriate action to prevent their recurrence.’ The two CSB videos provide insight into the importance of effective incident investigation and resulting recommendations, follow up and closure.
As a case study, assess your own site’s incident reporting and investigation system against the requirements of element 19.
Click on the Resources link to access the UK Energy Institute’s High Level Framework for Process safety management.
Click on the downloads button for the Energy Institute - High Level Framework for Process Safety Management publication.
7.2
The point of an investigation is to reveal the causes of an incident, and to use that information to reduce the likelihood of a recurrence of similar incidents.
These causes will fall into three categories: immediate, underlying and root.
The most obvious reason why an adverse event happens, for example, the tank was overfilled, the drain was open. There may be several immediate causes identified in any one adverse event.
The less obvious 'system' or 'organisational' reason for an adverse event happening, for example, the maintenance procedure was inadequate, the hazard had not been adequately considered via a suitable and sufficient risk assessment; production pressures were too great.
The Health and Safety Executive defines Root Cause as an initiating event or failing from which all other causes or failings spring. Root causes are generally management, planning or organisational failings.
CCPS define Root Cause as a fundamental, underlying system-related reason why an incident occurred that identifies a correctable failure in management systems. In essence, what is the fundamental failing in the company's management system?
7.3
The following quote from HSG 245 Investigating Accidents and Incidents captures an important principle in Process Safety Management.
What is being suggested here is that investigations which identify root causes allow organisations to take actions that will strengthen their underlying management system and prevent a recurrence of the incident.
For further reading click on the available resources.
To get rid of weeds you must dig up the root. If you only cut off the foliage the root will grow again.”
7.4
A serious event may take place which doesn’t actually cause a release, so it is classed as a Near Miss.
So, for example, with the Buncefield incident, if someone had discovered that the high level switch was not functioning – in other words, there was a problem with a key protective system - and taken remedial action, that would have been classed as a Near Miss.
Typically, the majority of Near Miss reports and investigations are linked to Personal, rather than Process Safety. Investigations should have distinct phases.
- Reporting
- Information Gathering
- Analysis
- Making Recommendations
- Action Plan & Implementation
- Sharing Information
7.5
Reporting
Information Gathering
Analysis
Making Recommendations
Action Plan & Implementation
Sharing Information
Reporting - in the immediate aftermath of an incident the plant must be made safe and steps taken to prevent further injury or escalation of the incident. It’s important to determine the level of investigation, which will depend on the scale of the incident. If it is serious, an independent, third party investigator from another site may be needed. For less severe incidents, a local team will generally handle the process. It is important to assign responsibility for the reporting and investigation, and to determine the methodology. Investigations should be formally reported and recorded, in paper-based or electronic format, and kept in a secure place, with evidence preserved.
Information Gathering - this process explores all reasonable lines of enquiry. It should be carried out within a reasonable timescale, which allows sufficient time for thoroughness, but also acknowledges that there’s a need to get things done. There should be a structure to the process: capturing evidence, and setting out clearly what is known and not known, recording the investigative process, cordoning off areas under investigation, not unlike a Crime Scene Investigation. In this way, the team will build up the foundation of an effective investigation. Only when this process is complete should operators and maintenance crews be allowed access to the area under investigation to rebuild and/or re-commission the equipment.
Analysis - this process should be objective and unbiased, and will aim to identify the sequence of events and conditions that led up to the incident. It will initially identify the risk control measures that were missing, inadequate or unused - the immediate causes, as per the Swiss Cheese holes - and will then identify the underlying causes and root causes.
7.6
Reporting
Information Gathering
Analysis
Making Recommendations
Action Plan & Implementation
Sharing Information
A good investigation should be followed up properly by a clear implementation strategy. An action plan should be produced, which has SMART objectives (Specific, Measurable, Agreed, Realistic and Time-bound), and deals effectively with the immediate, underlying and root causes. It’s also important to provide feedback by communicating the results of the investigation and the action plan to everyone who needs to know, and to include arrangements to ensure the action plan is implemented and progress monitored. This applies to everyone, not just the Health and Safety team. Good companies, with effective management teams have action-tracking systems, whereby people can be called to account if they don’t do what they’re supposed to do.
This process explores all reasonable lines of enquiry.
It should be carried out within a reasonable timescale, which allows sufficient time for thoroughness, but also acknowledges that there is a need to get things done.
There should be a structure to the process: capturing evidence, setting out clearly what is known and not known, recording the investigative process, cordoning off areas under investigation, not unlike a Crime Scene Investigation. In this way, the team will build up the foundation of an effective investigation.
This is about knowledge transfer - identifying information and lessons learned that are important to share with other parts of the company and external organisations.
There is a particular skill in presenting information related to an incident at your site that is meaningful to other people, and which will allow them to take their own appropriate action. This should be done using an appropriate mechanism, such as an industry publication. Most companies have systems that information can be fed into.
7.7
In the Energy Institute ‘High Level Framework for Effective Process Safety Management’, the essential attributes of an effective inspection and maintenance system are defined in:
Element 15: ‘Inspection and Maintenance’
Element 16: ‘Safety Critical Devices’
Read this document carefully, and then, as a case study, assess your own site’s inspection and maintenance system against the requirements of these two elements.
For further reading click on the available resources.
7.8
Other relevant material is available in ‘The Industrial Operator’s Handbook’. Chapter 16 (Overseeing Maintenance, Modification and Testing).
This chapter details the importance of ensuring that equipment has been correctly maintained and reinstalled, and explores the need for effective communication and team work between operations and maintenance departments.
It includes a case study of an incident in Pasadena, Texas, where a major explosion killed about 20 people, and shows how contributory factors were a failure of maintenance, issues around Permits to Work, and a failure of communication between the operators and the maintenance team. It also looks at maintenance modification or testing, the responsibilities of maintenance, and, critically, the interface between operator and maintenance teams, and how to get effective interaction between them.
7.10
The essential attributes of an effective management of change system are defined in Element 12: ‘Management of Change and Project Management’ of the Energy Institute High Level Framework for Effective Process Safety Management.
As a case study, you should assess your own site’s operating procedures against the requirements of Element 12.
For further reading click on the available resources.