Safety Critical Systems
Home Up History Ethics Paradigms Resources Architecture

 

 

A Famous Failure

One of the most documented of failures in a safety critical system is the Therac-25.

The Therac-25 is a series of radiation machines used to treat cancer. It operated in two modes: low-power and high-power electron beams. In the high power mode, a tungsten shield was placed in the path of the electron beam to reduce the radiation from a lethal to a therapeutic level.

Earlier machines such as the Therac-20 used hardware interlocks to protect against errors. The Therac-25 used software to ensure that the tungsten shield was in place in the high-power mode. An error in the software allowed the machine to operate in high-power mode without the shield (under certain rare circumstances) and several patients were killed by massive doses of radiation.

 

 

 Responsibility of the Engineer

Computer Scientists are sometimes involved with projects were the reliability of the end product is a matter-of-life and death. If you design a heart pace-maker, you can't have it flash up the message "Unknown error #42, please refer to the operator's handbook". Equally, the designer of an instrument landing system can't put a placard on the windshield saying "If the autopilot fails, please look out of the window and avoid flying into solid objects".

The engineer has a responsibility for the products they design (or test or specify). Is this responsibility moral, moral, ethical, professional or legal? The use of safety-critical hardware and software in today's world in medical, automotive, and aviation products  has made these issues more important to the modern computer scientist or engineer.

At the moment, a public debate is taking place on the balance between the responsibility of the consumer and the designer or manufacturer.  The manufacturer tends to use the caveat emptor agreement (let the buyer beware) and expects the end user to understand the product and its limitations and to use it in a responsible way.

The end user believes that the responsibility for the produce should lie with the manufacturer. The manufacturer has privileged information that is not available to the consumer and therefore the consumer cannot make a fully informed decision when buying or using a product.

Like most professional organizations, the IEEE has a code of ethics. Of relevance to computer engineering is:

We ... commit ourselves to the highest ethical and professional conduct and agree:

1. to accept responsibility in making engineering decisions consistent with the safety, health, and welfare of the public, and to disclose promptly factors that might endanger the public or the environment;

2. to avoid real or perceived conflicts of interest whenever possible, and to disclose them to affected parties when they do exist;

3. to be honest and realistic in stating claims or estimates based on available data;

The essence of this code of ethics (and similar codes produced by other societies) is that the engineer is responsible for his or her actions. It is not enough to undertake work that that you carry out in the minimum time with the minimum care, checking, and thought for the eventual consequences of the work.

Of course, this poses ethical and personal dilemmas if you are placed in a position where your employer requires you to behave in an unethical fashion.

Ethics and safety-critical systems are not simply academic debating points. In the UK there is a move to introduce a "Corporate Manslaughter" law that would make the directors of a company responsible for the actions of the company. The law regards a director as the "controlling mind and will" of the company   For example, if a train crashes due to poor line maintenance, the directors of the company that maintains the track would be liable to a charge or corporate manslaughter if it could be proved that they were negligent. At the moment, only the individuals who are directly responsible for the actions leading to death are responsible.

 

 

 

The Computer Aided Crash

Computers in the cockpit improve aircraft reliability, navigation and aircraft control. In shirt, they take much of the burden from the pilot. However, it is possible for a computer to be a contribution to a disaster.

On January 20, 1992 an Airbus A320 crashed near Strasbourg in France. The crash occurred because the aircraft developed a very high rate of descent which went unnoticed by the crew (even though they had selected the inappropriately high rate of descent).

The autopilot allows the pilot to select either angle of descent or rate of descent. It appears that the crew intended a 3 degree rate of descent but inadvertently selected a 3000 foot per minute. They selected the wrong operating modes and confused the "vertical-speed" and "flight-path-angle" modes of descent. The instrumentation makes little distinction between these mores and there is no warning mechanism.

The pilots were busy during the landing phase and neglected to monitor the aircraft's descent rate. Equally a late change in the flight plan had been requested by the tower.

Although an altitude warning was received a second before the impact with high ground, 87 people lost their lives.

The board of inquiry blamed the accident on pilot error because the crew had failed to monitor the aircraft's altitude.

The design of the system interface is clearly a contributory factor. The difference between the two descent modes was hardly glaring and no warning was given of a dangerous rate of descent at a time when the crew was distracted by other duties.

You can argue that those who design safety-critical systems should be aware of failure modes due to operator overload or even inattention to detail.

 

 

   
Professor Alan Clements
School of Computing
University of Teesside
Middlesbrough TS1 3BA
England