Chapter 14 - Resilience Engineering
Resilience engineering does not focus on avoiding failure but rather on accepting the reality that failures will occur
Which of the following are resilience activities involved in the detection of and recovery from system problems
Cybersecurity is concerned with all of an organization’s IT assets from networks through to application systems.
Fundamentally, resilience engineering is a technical rather than a sociotechnical activity.
In designing a resilient system, you have to assume that system failures or penetration by an attacker will occur, and you have to include redundant and diverse features to cope with these adverse events.
involves simulating possible system failures and cyberattacks to test whether the resilience plans that have been drawn up work as expected.
Resilience planning should be based on the assumption that systems will be subject to cyberattacks by malicious insiders and outsiders and that some of these attacks will be successful.
Critical services are defined as services that are essential if a system is to ensure its primary purpose
Human error is rarely the cause of accidents in safety critical systems
You should not design operational processes to be flexible and adaptable because operators and system managers may sometimes have to break rules and “work around” the defined process
Explain how the complementary strategies of resistance, recognition, recovery, and reinstate- ment may be used to provide system resilience.
The judgement by which any system maintains continuity of systems critical resources during any disruptive event like cyberattack or failure of equipment is known as system resilience. The system resilience has for strategies which are as given below:
- Recognition: In this strategy, the operators of the system recognize possible problem which can cause system failure. Mostly, this reorganization is done before the failure is occurred.
- Resistance: This strategy is used when the possibility of any cyber attack is occurred. By using this strategy, the probability of system failure can be reduced. The critical parts of the system are being focused by this strategy.
- Recovery: When failure of system is occurred, then, recovery strategy is used to restore the critical services of the system on quick bases. By doing so, the faith of users for any system can be maintained.
- Reinstatement: When the services of the failed system are being restored and system is performing all its activities in normal condition, then, reinstatement strategy is used as final activity.
What are the types of threats that have to be considered in resilience planning? Provide examples of the controls that organizations should put in place to counter those threats.
- Threats to the confidentiality of assets. Data is not damaged but it is made available to people who should not have access to it.
- Threats to the integrity of assets. These are threats where systems or data are damaged in some way by a cyberattack.
- Threats to the availability of assets. These are threats that aim to deny use of assets by authorized users.
Examples of controls
- Authentication, where users of a system have to show that they are authorized to access the system.
- Encryption, where data is algorithmically scrambled so that an unauthorized reader cannot access the information.
- Firewalls, where incoming network packets are examined then accepted or rejected according to a set of organizational rules. Firewalls can be used to ensure that only traffic from trusted sources is passed from the external Internet into the local organizational network.
A hospital proposes to introduce a policy that any member of clinical staff (doctors or nurses) who takes or authorizes actions that leads to a patient being injured will be subject to criminal charges. Explain why this is a bad idea, which is unlikely to improve patient safety, and why it is likely to adversely affect the resilience of the organization.
The judgement by which any system maintains continuity of systems critical resources during any disruptive event like cyber attack or failure of equipment is known as system resilience. The organization which is adaptable and flexible to use system resilience for protection against any type of attack is known as resilient organization. As hospital is a critical system, but there is also some probability of human errors.
Applying criminal charges to nurses and doctors for taking action for any patient is bad idea. If criminal charges are applied, then, any doctor or nurses will not action for any patient. Doctors and nurses are to save the lives of patients. They don’t take any wrong action intentionally. For improvement in safety of patient, following points can be considered:
- A warning for any conflict alert can be displayed on doctor’s screen.
- The records of patients can be formalized for effective management of patient data.
- A team can be appointed who will collectively monitor and checks the works of doctors and nurses.
If criminal charges are applied to nurses and doctors for taking action for any patient, the resilience of hospital can’t be maintained. The points to show its affects are as given below:
- The adaptability and flexibility of hospital system can’t be maintained.
- Any individual will not take the responsibility to take care of any patient.
What is survivable systems analysis and what are the key activities in each of the four stages involved in it as shown in Figure 14.8?
- System understanding: For an existing or proposed system, review the goals of the system (sometimes called the mission objectives), the system requirements and the system architecture.
- Critical service identification: The services that must always be maintained and the components that are required to maintain these services are identified.
- Attack simulation: Scenarios or use cases for possible attacks are identified along with the system components that would be affected by these attacks.
- Survivability analysis: Components that are both essential and compromisable by an attack are identified and survivability strategies based on resistance, recognition and recovery are identified.
Explain why process inflexibility can inhibit the ability of a sociotechnical system to resist and recover from adverse events such as cyberattacks and software failure. If you have experience of process inflexibility, illustrate your answer with examples from your experience.
The systems which are very complex and not understandable as a whole are known as sociotechnical systems. These systems are divided into layers for better understanding.
Planning of cyber-resilience includes decisions for flexibility or inflexibility of response at the time of any cyber attack. The ability of any sociotechnical system for recovering and resisting any software failure or cyber attack can be inhabited by process inflexibility. The points to show the reasons are as given below:
- It is well known that confliction in security and a resilience requirement is very often. So, when these requirements conflict, process inflexibility can easily be inhabited.
- The security and resilience are included in any policy to provide limited access to users. Process inflexibility does the same thing by restricting the system users to follow security policy of sociotechnical system.
- Process inflexibility will allow limited users to make changes, thus, it can be inhabited in sociotechnical system.
- Most of the sociotechnical systems are used in government organizations, so, for maintaining user access, process inflexibility can easily be inhabited.
Suggest how the approach to resilience engineering that is proposed in Figure 14.9 could be used in conjunction with an agile development process for the software in the system. What problems might arise in using agile development for systems where resilience is important?
A more general resilience engineering method, as shown in Figure 14.9, takes the lack of detailed requirements into account as well as explicitly designing recovery and reinstatement into the system. For the majority of components in a system, you will not have access to their source code and will not be able to make changes to them. Your strategy for resilience has to be designed with this limitation in mind.
There are five interrelated streams of work in this approach to resilience engineering:
- You identify business resilience requirements. These requirements set out how the business as a whole must maintain the services that it delivers to customers and, from this, resilience requirements for individual systems are developed. Providing resilience is expensive, and it is important not to overengineer systems with unnecessary resilience support.
- You plan how to reinstate a system or a set of systems to their normal operating state after an adverse event. This plan has to be integrated with the business’s a technical or human error. It should also be part of a wider disaster recovery strategy. You have to take account of the possibility of physical events such as fire and flooding and study how to maintain critical information in separate locations. You may decide to use cloud backups for this plan.
- You identify system failures and cyberattacks that can compromise a system, and you design recognition and resilience strategies to cope with these adverse events.
- You plan how to recover critical services quickly after they have been damaged or taken offline by a failure or cyberattack. This step usually involves providing redundant copies of the critical assets that provide these services and switching to these copies when required.
- Critically, you should test all aspects of your resilience planning. This testing involves identifying failure and attack scenarios and playing these scenarios out against your system.
A senior manager in a company is concerned about insider attacks from disaffected staff on the company’s IT assets. As part of a resilience improvement program, she proposes that a logging system and data analysis software be introduced to capture and analyze all employee actions but that employees should not be told about this system. Discuss the ethics of both introducing a logging system and doing so without telling system users.
Introducing the logging system for keeping track of user activities inside the organization is a good idea. By doing so, any insider attack from staff members can be easily identified.
The points to show the ethics of doing so are as given below:
- The logging system will keep track of all the users who are login to the system. By doing so, if any unusual activity is occurred, the system can be easily identified.
- If any employee tries to make access of unauthorized data, the data analysis software will easily identify the system and can send an alert to administrator.
- The security of system can be improved by doing so.
Introduction of software for data analysis and using logging system is a good idea but not telling the users about is highly unethical. Inside any organization, different users will have different access rights, so, the user must be aware about it.
The ethical issues associated with it are as given below:
- The system users will always think that, the organization does not trust them, so, integrity of overall organization will be affected.
- Users will not work freely, so, there will be decrement in effectiveness of the system.
- The quality delivered by users to their respective customers will also affect.
- The users are not able to perform any personal activity inside the organization and they will think bounded.
So, it can be said that, introduction of software for data analysis and using logging system is a good idea but not telling the users about is highly unethical.
In Section 13.4.2, (1) an unauthorized user places malicious orders to move prices and (2) an intrusion corrupts the database of transactions that have taken place. For each of these cyber- attacks, identify resistance, recognition, and recovery strategies that might be used.
Recognition: The system or its operators should recognise early indications of system failure.
Resistance: If the symptoms of a problem or cyberattack are detected early, then resistance strategies may be used to reduce the probability that the system will fail.
Recovery: If a failure occurs, the recovery activity ensures that critical system services are restored quickly so that system users are not badly affected by failure.
Reinstatement: In this final activity, all of the system services are restored and normal system operation can continue.