As discussed in lesson 1, there is a wide range of definitions of resilience related to a wide range of domains. However, a general definition provided by UNISDR, seems to be appropriate to describe the concept, defining resilience as “the ability of a system, community or society exposed to hazards to resist, absorb, accommodate, adapt to, transform and recover from the effects of a hazard in a timely and efficient manner, including through the preservation and restoration of its essential basic structures and functions through risk management”. This illustrate that resilience is a process that has to be present and enhanced before, during and after a crisis or disruption of services.
As an operator of a critical infrastructure, one would like to know if your asset or system has this ability. The question that often is raised is how this can be measured. One metric often used is the performance loss function, referred to as the resilience triangle, measuring the performance of the infrastructure over time. However, an exact measure of the performance can only be done in retrospective, and hence, it does not always reflect how resilient the infrastructure is when facing future disruption.
In lesson 2 we presented two resilience management frameworks, ICI-REF and IS-REF. An important step in both resilience management frameworks is the resilience analysis, and a sufficient resilience analysis methodology should be used for this purpose. Within the IMPROVER project, four different resilience analysis methodologies have been developed, aiming at covering the different resilience domains at different level of detail. This lesson covers the four analysis methodologies; CIRI, ITRA, IORA and ISRA.
With reference to the ICI-REF framework, CIRI can be used for resilience analysis, either as a ‘stand-alone methodology’, or as a mapping tool for further analysis, using methodologies such as IORA (IMPROVER Organisational Resilience Analysis) and ITRA (IMPROVER Technological Resilience Analysis), respectively. One of the main advantages with CIRI is that it includes and integrates indicators from both the technological and organizational domain.
The Critical Infrastructure Resilience Index is inspired by the crisis management cycle (Pursiainen, 2017), breaking resilience into seven different phases - risk assessment, prevention, preparedness, warning, response, response and learning – describing the temporal dimensions of resilience.
Each resilience phase is broken down into structures, processes, and components, referred to as Level 2 in the hierarchical structure. Each level 2 is further broken down to measurable indicators on Level 3. However, how these indicators are measured depends on the sector, and as a consequence, data from sector-specific indicators (Level 4) are aggregated and transformed to a common scale, giving the Level 3 indicator a score between 0 and 5.
The first resilience phase analysed is Risk Assessment. According to the ICI-REF framework, the risk assessment itself is already executed. However, as it lays the ground for the next resilience phases it is included in the resilience analysis as the first step. The focus here is on the quality of the risk assessment and the methods and procedures used, rather than the actual results from the risk assessment. Following ISO-31000, risk assessment can be broken down in three components: risk identification, risk analysis, and risk evaluation. In general, in the context of the resilience phases, the risk assessment is used to get knowledge on what to prevent and mitigate, prepare for, monitor, and respond to.
The next resilience phase is Prevention, which can be interpreted as “activities and measures to avoid existing and new disaster risks” (UNISDR, 2017). However, as complete avoidance of losses is not always feasible, the task also, to some extent, includes mitigation and limitation of impact. From an organizational point of view, one of the key processes here is to implement a safety and security culture, for internal and external staff, but also for the public using the service. Technologically, to prevent future disruptions, it is essential to have a system that is robust, but also have a high degree of redundancy in the system. For instance, robustness includes using the right material for your physical structures, while redundancy relates to the capacity of the infrastructure and having available back-up systems.
As history show, it is difficult to prevent all kinds of future disasters and infrastructure disruptions. Hence, as an operator of an infrastructure, one needs to build knowledge and capacities to raise the preparedness level. Preparedness includes, among other things, the organization’s ability to make proper plans and to organize the action that needs to be done if something occurs. It is important to set the requirements and to agree on roles and responsibilities. Preparedness also include cooperation, internal and externally, by e.g. making agreements. In addition, two key processes here are capacity and capability building. Capability building is about making sure that the human resources have the right skill, competence and knowledge. Capacity building is about making sure that you have the needed resources available, which could be: human resources, equipment and tools, monetary resources, technical systems, and so forth.
The next resilience phase is framed under the concept of Warning. One could argue that warning or early warning is embraced by the preparedness concept. However, in this methodology, this is separated to give a more detailed analysis. This resilience phase can be broken down to three key components. First, it is important to have a monitoring and detection system in place, and to ensure that the right parameters are monitored and that the data is of good quality. Secondly, when an anomaly is detected, the data must be analyzed and used as input to make predictions and forecasts. Finally, alerts and warnings are distributed, internally in the organization, to external partners, and to the end-users.
While preparedness is about building knowledge and capacities, Response is about the concrete action taken before, during or immediate after a disruption. After getting a warning, the organization needs to identify and frame what situation they are facing, and activate measures to limit the damage, often referred to as mitigation measures. A key to a successful response is effective communication and coordination actions. It is essential for an organization to be able to coordinate both internal and external resources, and to have proper communication means in place. In addition, as an operator of a critical infrastructure, it is important to communicate with the public.
While response is the immediate actions, Recovery include the concepts of restoring and improving the infrastructure. UNISDR defines recovery as the “restoration, and improvements where appropriate, of facilities, livelihoods and living conditions of disaster-affected communities, including efforts to reduce disaster risk factors”. The recovery phase starts when the emergency phase has ended, and is the actions should be based on pre-defined plans and strategies. A key component here is repairability, including supportability and maintainability. This means, for instance, access to personnel, tools and equipment, site accessibility, and interchangeability of key components .
The last resilience phase is Learning. To be better prepared for future events, organizations and communities must be able to evaluate, learn and implement lessons from past events. As an operator of a critical infrastructure, one must evaluate the performance of the system, in addition to the organizational issues, such as planning and actions executed. However, it is not enough to only do the evaluations, lessons need to be implemented as well. The implementation of lessons could be on a technical, organizational, individual and public level or as network lessons where several intuitions are able to implement common lessons.
The seven resilience phases build the basis for the resilience analysis. The key components, structures, and processes for each phase are further broken down to measurable indicators (Level 3). However, how these are measured will depend on the sector. For instance, even though the concept is the same, redundancy is measured differently for a water distribution network than for a road network. Consequently, sector specific indicator cards (Level 4) have been developed, where a Level 3 indicator might have one or several indicator cards, depending on the sector. Figure 3.1 illustrate the structure by showing one branch under Prevention. The structure is fixed for Level 1 and Level 2, and mainly for Level 3 as well. If the operator feels that some indicator is missing, Level 3 indicators can be added.
Figure 3.1. CIRI Structure example.
For a common viewpoint, Level 4 indicators are transformed to semi-qualitative scale, ranging from 0 to 5. At Level 3 and 4, the operator has the possibility to assign weight to the indicators according to their importance. After assessing the Level 4 indicators, results are aggregated up the hierarchy, and each Level 1-3 indicator get a score from 0 to 5. Figure 3.2 illustrates a fictional example of the analysis under one resilience phase, identical to the example structure in Figure 3.1, namely prevention. For simplicity, only the analysis of one Level 3 indicator is shown. Level 2 component Redundancy is divided into ‘system capacity’, ‘back-up facilities and equipment’, and ‘flexibility’. These Level 3 indicators are measured at Level 4 with sector specific indicators, in this case, three level 4 indicators under each Level 3 indicator. The indicators presented in this illustration are assigned equal weight. The fictional scores at Level 4 are shown in the so-called indicator cards and then aggregated to the corresponding Level 3 indicator. The results are presented in a radar chart with all the seven Level 1 indicators, as illustrated in Figure 3.3. In addition, to present a more detailed analysis, it is possible to construct charts for all Level 1 over their respective components, structures and processes (Level 2).
Figure 3.2. CIRI Fictional example.
Figure 3.3. Example radar chart, level 1.
To address the need for proper descriptions and definitions of sector-specific indicators, indicator cards have been developed for the of technological and organizational resilience indicators at the lower CIRI level (Level 4). Each individual resilience indicator card provides a detailed description of the sector-specific indicator subject to assessment. The cards consist of the following information:
Feedback from CI operators indicate that to execute the analysis there is a need for a clear guideline. Hence, a guideline is developed with a step-wise workflow description. The workflow includes, but are not limited to, the following steps:
If missing or out of context are identified in step 7, then the process is restarted at step 4. This process can then be considered iterative.
This lesson is based on Deliverable 2.2. Download it here.
Pursiainen, C. (2017). The Crisis Management Cycle: Theory and Practice: Routledge. UNISDR. (2017). Terminology on disaster risk reduction Retrieved from https://www.unisdr.org/we/inform/terminology
The performance loss and recovery function is meant to illustrate the technological resilience of a CI system over time (Figure 3.4). It is normal that the system performance decays slowly over time just due to aging, reflected in Figure 3.4 by the difference between Q0 and Q1. A sudden drop in the performance represents the effect of a sudden shock from a hazard. How deep the drop goes, and the steepness of the drop depends on how well the system is able to resist and absorb the shock and respond to it, warning and response. This is, however, an indirect result of the how well the system has earlier targeted the measures of risk assessment, prevention and preparedness. The gain in performance over time reflects how well the system recovers from the shock and eventually learns from the shock. The learning phase enables and could likely lead to increased system functionality compared to before the shock, due to the renewal in restoration of the system, but also from the learning experience itself.
Figure 3.4. The "resilience triangle" is shown schematically. The performance, Q, is one of the sub-categories quantifying the performance of the system. The response (1-3) and recovery (3-5) after an earthquake is shown as the time it takes to repair and restore the system.
During the first few days after a sudden big hazardous event, the response activities from society is likely focused on saving lives. Emergency response during this period is likely to be focused on rescue operations, sheltering and provision of food and water.
Many local governments or municipalities also set general performance goals in the case of a crisis. These targets could likely be based on typical recommendations made during “normal” conditions, for example: individuals should maintain a three-day emergency supply of drinking water. Here the societal resilience improved by information and communication will support the recovery of the technical infrastructure and the overall resilience of the system. The set targets also have to consider the time required to mobilize staff and temporary service centers for basic needs such as water, healthcare and food. During the emergency response it’s all about prioritizing activities until the situation is stabilized enough to start planning ahead. That’s when the recovery phase can start.
The recovery phase in the technical domain is very focused on the actual physical system, its status after the shock depending on its fragility and also the ability to mobilize staff to perform the repairs.
After the event of a great hazard affecting a water distribution system, the population would likely have to rely on emergency supplies for the first days. Water for hospitals would be restricted. Emergency water supplies would meet only direct consumption needs. For the first one to two weeks, water would likely have to be delivered via tanks and people would have to carry the water home from distribution centers. The time required recovering water services will depend on the actual intensity of the hazard event, the size and complexity of the water network, the availability of staff and the financial and material resources needed to complete repairs. The time required will also be dependent on damage to other infrastructure, such as the transportation, communications, fuel, and power systems. The phase of ensuring and fixing these issues will likely end up in a list of actions that needs to be prioritized as in Table 3.1.
Table 3.1. List of action priorities during the emergency response phase. (taken from a case study on a water network in the event of an earthquake).
Table 3.1 tells us that the recovery phase will likely be able to start on day 6. At this point the most acute actions have been taken and resources can be put on planning ahead. The repair time for a water network depends on the pipe diameter and material in general but varies also based on the depth of the pipe, the subsoil type and constraints which prevent to reach the pipes. A repair strategy of the system developed before the event will likely help to refine the process of repairing many pipes following a big hazardous event and to plan for the mobilization of the sufficient staff.
Table 3.1 does include some actions which in fact contribute to recovery of the service provided by the water infrastructure, for example the delivery of water to community supply points. This results in some service being provide to the community, although of limited quantity and of course with the need to boil water prior to drinking. When this information is combined with information about the recovery time of the infrastructure – obtained by analysing the performance loss and recovery function for the infrastructure in question then the combined information about the quality of the service provision, the delay to fixing pipes, etc. can result in information similar to that presented in Figure 3.5. This shows the percentage of the population which have service provision of different levels of quality over a period of time after the incident occurs.
Figure 3.5 Recovery of water infrastructure incorporating the performance loss and recovery function and the emergency response phase
The maintenance of service and ability to rapidly restore full service are two components of CI resilience that are captured in the resilience triangle evaluating the technological resilience in the event of a hazard. The smaller the triangle, the better the resilience of the system is. It’s, however, left unsaid what is good and what is bad. For actors that meet public needs, such as CI operators, the general public’s expectations and tolerance levels should be considered. The IMPROVER project has found that the general public has reasonable expectations of CI operators in crisis times and could be a relevant criterion for CI resilience evaluation. It is then important to develop comparable performance measuring units for the technological resilience and the public’s expectations. For a water system the following categories are suggested to be looked upon: water quality, water quantity and water delivery.
We suggest using the general public’s declared coping capacity as criteria for CI resilience evaluation. In order to do so, public perception of their own coping capacity in crisis times must be obtained in a way that is comparable to the technical performance measures of the service in question. We propose to use a questionnaire survey in order to determine this. The goal of the questionnaire then is to have a comprehensive understanding of the local population’s expectations and tolerance levels in relation to both the reduced level of service and the crisis communication of the operator during disasters. It is important to remember that it is the public’s own views that are of interest here.
An example of this is shown in Figure 3.6. In this case the ability of the infrastructure to provide water is shown compared with the tolerances of the population to relay on water stored before a damaging incident. It can be seen that the capacity in the short term is above tolerances, however after 12 hours and up to over a month after the incident the infrastructure is seen to be deficient. Note that in this case the infrastructure would be encouraged to propose remedial actions to develop a resilience treatment strategy. These could take the form of technical changes to the infrastructure to improve the capacity to deliver the service in the event of an incident; or they could take the form of information campaigns to encourage the population to store more water in advance of an incident.
Figure 3.6. Comparison of tolerances of water delivery versus the ability of an infrastructure network to provide water.
The Improver Project has defined four organisational functions that will have profound effects on the potential for resilient performance in operations. These functions are the top level of a Resilience Indicator Framework, describing how more abstract functions materialise in operations. The Design of Roles, Tasks and Processes will determine the room for flexibility and autonomous action and will affect how interactions happen in the organisational hierarchy. Artefact Design - Procedures and Tools will determine usability for all artefacts used in the operational and provide opportunities for worker involvement. Strengthening Collaboration has to do with the way relations are formed within and outside of the organisation and the room for spontaneous collaboration at many organisational levels. Under the heading of Learning and Re-Design, the resilient organisation not only analyses negative events, but also attempts to understand the necessary conditions for sustained, successful operations and that information about identified problems identified are actually used for re-design. Finally, Underlying Concepts and Values has to do with the organisation’s perception of its employees – that people are regarded as a success factor, that a common understanding of work is sought and that a systems perspective is applied in organisational analyses. At the lowest level of the framework, a number of Organisational Resilience Processes can be found that an organisation can compare itself against in Organisational Resilience Evaluation.
The IMPROVER Project has developed a simple methodology and process for the organisation that wants to assess its organisational resilience. The method is experience-based and lets the organisation understand past events in terms of resilience. It should be pointed out, however, that the interpretation of OR capacities demands knowledge of resilience as interpreted in a Safety context.
In this process an analyst works in the organisation to uncover experiences of past events – negative or positive - strong narratives that are perceived as meaningful for its employees. A number of these events are examined, structured and visualised by the analyst. Then, the organisation processes the events in workshops where the analyst introduces elements of Organisational Resilience based on the Resilience Indicator Framework, offering new ways of interpreting actions and conditions. Where the organisation sees proof of resilient action or barriers to such performance, possible helping or reinforcing solutions are discussed, again referring to the framework as a support. The main objective of these exercises is not to create an imagined “objective” model of the organisation. Rather it is to introduce a blame-free, constructive discussion about working conditions and concrete solutions.
Additional information on the IORA methodology can be found in the report “Operationalising organisational resilience to critical infrastructure” (D4.6)
Assessing and enhancing resilience of critical infrastructures will not automatically result in a resilient society, since social and human dimensions have a strong influence in the achievement of a resilient society. Indeed, it is important to consider the link between physical and human systems to understand and enhance societal resilience. From a critical infrastructure’s point of view, a societal resilience analysis can provide a holistic picture of a community’s strengths and weaknesses in times of disasters, offering an understanding about what kind of society the critical infrastructure operates in. However, the concept of resilience is not yet fully operationalized, and many different approaches have been developed to achieve a measurement of resilience.
In the field of societal resilience, the concept of coping capacity, adaptive capacity and transformative capacity are common denominators that are used to categorize capacities needed to achieve resilience. Coping capacity refers to the ability to respond, absorb and recover from a disruptive event and is generally related to a time frame close to the event. Adaptive capacity includes the ability to plan for and adjust to future challenges, which is related to a longer time-frame both before and after an event. Resilience is not only about quick recovery and adjusting to new circumstances; the aspect of transformation must also be taken into account. Transformative capacity refers to the ability to transform the stability landscape in order to create new, better, pathways for the system and is thus related to major changes in the long-term. Resilience is by definition a complex and multidimensional topic, and a resilience assessment should ideally include all of these dimensions and their interdependencies.
While there are no generally agreed upon metrics for measuring societal resilience, indicators can be an effective tool to help decision makers understand where their community stands in terms of resilience and as a base for developing plans and strategies to enhance resilience. The IMPROVER project has identified six major societal resilience dimensions, called capitals, which make up the basis for the indicators within IMPROVER Societal Resilience Analysis (ISRA). The indicators are further categorized by which capacities they have influence on (Figure 3.5).
Figure 3.5. Structure of the IMPROVER Societal Resilience Analysis methodology.
ISRA is developed to be used as a self-assessment by rating a set of indicators on an agreement scale from strongly disagree to strongly agree. The proposed aim of using the methodology is to:
The assessment is intended to be performed by a group of people with different areas of expertise.
Additional information on the ISRA methodology can be found in the report “Report of organizational and societal resilience concepts applied to living labs” (D4.4).
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 653390