|
A Famous Failure One of the most
documented of failures in a safety critical system is the Therac-25.
The Therac-25 is a series of radiation machines used to treat
cancer. It operated in two modes: low-power and high-power electron beams.
In the high power mode, a tungsten shield was placed in the path of the
electron beam to reduce the radiation from a lethal to a therapeutic level.
Earlier machines such as the Therac-20 used hardware
interlocks to protect against errors. The Therac-25 used software to ensure
that the tungsten shield was in place in the high-power mode. An error in
the software allowed the machine to operate in high-power mode without the
shield (under certain rare circumstances) and several patients were killed
by massive doses of radiation.
|
Responsibility of the Engineer
Computer Scientists are sometimes
involved with projects were the reliability of the end product is a
matter-of-life and death. If you design a heart pace-maker, you can't have
it flash up the message "Unknown error #42, please refer to the operator's
handbook". Equally, the designer of an instrument landing system can't put a
placard on the windshield saying "If the autopilot fails, please look out of
the window and avoid flying into solid objects".
The engineer has a responsibility for the
products they design (or test or specify). Is this responsibility moral,
moral, ethical,
professional or legal? The use of safety-critical hardware and software in
today's world in medical, automotive, and aviation products has made
these issues more important to the modern computer scientist or engineer.
At the moment, a public debate is taking
place on the balance between the responsibility of the consumer and the
designer or manufacturer. The manufacturer tends to use the caveat
emptor agreement (let the buyer beware) and expects the end user to
understand the product and its limitations and to use it in a responsible
way.
The end user believes that the
responsibility for the produce should lie with the manufacturer. The
manufacturer has privileged information that is not available to the
consumer and therefore the consumer cannot make a fully informed decision
when buying or using a product.
Like most professional organizations, the
IEEE has a code of ethics. Of relevance to computer engineering is:
We ... commit
ourselves to the highest ethical and professional conduct and agree:
1. to accept
responsibility in making engineering decisions consistent with the safety,
health, and welfare of the public, and to disclose promptly factors that
might endanger the public or the environment;
2. to avoid real or
perceived conflicts of interest whenever possible, and to disclose them to
affected parties when they do exist;
3. to be honest and
realistic in stating claims or estimates based on available data;
The essence of this code of ethics (and similar codes produced by other
societies) is that the engineer is responsible for his or her actions. It is
not enough to undertake work that that you carry out in the minimum time
with the minimum care, checking, and thought for the eventual consequences
of the work.
Of course, this poses ethical and personal dilemmas if you are placed in
a position where your employer requires you to behave in an unethical
fashion.
Ethics and safety-critical systems are not simply academic debating
points. In the UK there is a move to introduce a "Corporate Manslaughter"
law that would make the directors of a company responsible for the actions
of the company. The law regards a director as the "controlling mind and
will" of the company For example, if a train crashes due to poor
line maintenance, the directors of the company that maintains the track
would be liable to a charge or corporate manslaughter if it could be proved
that they were negligent. At the moment, only the individuals who are
directly responsible for the actions leading to death are responsible.
|
The Computer Aided Crash Computers
in the cockpit improve aircraft reliability, navigation and aircraft
control. In shirt, they take much of the burden from the pilot. However, it
is possible for a computer to be a contribution to a disaster.
On January 20, 1992 an Airbus A320 crashed near Strasbourg in
France. The crash occurred because the aircraft developed a very high rate
of descent which went unnoticed by the crew (even though they had selected
the inappropriately high rate of descent).
The autopilot allows the pilot to select either angle
of descent or rate of descent. It appears that the crew intended a 3
degree rate of descent but inadvertently selected a 3000 foot per minute.
They selected the wrong operating modes and confused the "vertical-speed"
and "flight-path-angle" modes of descent. The instrumentation makes little
distinction between these mores and there is no warning mechanism.
The pilots were busy during the landing phase and
neglected to monitor the aircraft's descent rate. Equally a late change in
the flight plan had been requested by the tower.
Although an altitude warning was received a second before
the impact with high ground, 87 people lost their lives.
The board of inquiry blamed the accident on pilot error
because the crew had failed to monitor the aircraft's altitude.
The design of the system interface is clearly a
contributory factor. The difference between the two descent modes was hardly
glaring and no warning was given of a dangerous rate of descent at a time
when the crew was distracted by other duties.
You can argue that those who design safety-critical
systems should be aware of failure modes due to operator overload or even
inattention to detail.
|