Safety-Critical Systems

The article first appeared in CrossTalk, Sep/Oct 2009

As safety-critical software moves from closed environments to open and commodity technologies, security threats will inevitably increase. Organizations dependent on mission-critical systems and networks are recognizing that the traditional “protect-detect-react” (PDR) strategy for countering intrusions and attacks is ineffective. A new information assurance and cybersecurity strategy is needed that augments PDR with the ability of systems and networks to “fight through” attacks. This article examines techniques that both security- and safety-critical software developers can leverage to increase their software’s survivability.

Software security and software safety share the need to assure that software will remain dependable under extraordinary conditions. Extraordinary conditions – those which software was not intended to gracefully tolerate – will either cause it to behave unpredictably or fail outright. What distinguishes software safety from software security is what constitutes an extraordinary condition for that software, and what is at stake if it fails as a result.

Extraordinary conditions that threaten software safety are termed hazards, reflecting the perception that such conditions are accidental. By contrast, extraordinary conditions that threaten software security are termed attacks or exploits, indicating their intentionality. The objective of most attacks on software is to sabotage or subvert the software’s operation by exploiting one or more weaknesses in the software’s execution environment (e.g., failure of the application firewall that blocks malicious input from entering the system), design (e.g., accepting input from unrecognized entities), implementation (e.g., accepting input in a fixed-length buffer without first validating that input’s length), operation (e.g., a failure of user interface software, thereby exposing the system’s command line), or development process (e.g., poor configuration control, peer review, and testing practices that allow a disgruntled programmer to surreptitiously embed malicious logic).

Intentional Threats to Safety-Critical Systems

Software failures that result from safety hazards can have dire, even fatal, consequences due to the extremely strong linkage between the software and the physical system that it is supposed to control. Whether the software constitutes the single small, closely contained embedded program that controls an automobile’s anti-lock braking system, or several dozen modules dispersed throughout a distributed supervisory control and data acquisition (SCADA) system controlling an entire region’s wastewater treatment, the functions performed by the physical components are what determine whether the system (including its software) is safety-critical. If a system failure results in damage to the physical environment in which people live, physical maiming, damage to health, or death of one or more humans, the system is safety-critical. The failure of software in such a system can have catastrophic results.

Safety hazards tend to be straightforward and accidental. By contrast, security threats are intentional: the result of human creativity and perspicacity absent from safety hazards (although a hazard may introduce a vulnerability that an attacker can intentionally exploit). Because they are guided by human intelligence, security threats are usually less predictable, more complex, more numerous, and more persistent than safety hazards. The same system may be repeatedly targeted by a variety of simultaneous and sequential attacks, some aimed at the interface level, others at the application components, and still others at the execution environment level – all orchestrated to accumulate and intensify until they collectively produce the critical failure(s), enabling the attacker to achieve his objective.

Google “Ariane 5 Flight 501,” “Therac-25 accidents,” or “Toyota Prius software bug” to read about some dramatic instances of safety-critical systems that failed as a result of design flaws or implementation errors in their software. These were unintentional flaws and errors, caused by developer inadvertence, negligence, or misapprehension, but their impact was dramatic. How much more disastrous might they have been had their cause been intentional exploitation or implanted malicious logic?

Now Google “trans-Siberian gas pipeline” + “software bug.” What you’ll get are reports of the 1982 technology coup. The CIA, having learned that Soviet spies planned to secretly acquire a gas pipeline controller developed in Canada, planted a Trojan horse (logic bomb) in the controller’s software. Once installed on the trans-Siberian pipeline, the controller ran a test of the pipeline’s pressure gauges during which the logic bomb reset those gauges to double gas pressure in the pipeline. The resulting explosion was, up to that time, the largest non-nuclear explosion ever photographed from space [1].

In the 25-plus years since that incident, attacks on safety-critical systems involving the embedding of malicious code or direct penetrations have proliferated, several of which have been perpetrated by the systems’ own disgruntled developers or administrators. Such attacks are proliferating due in part to opportunity: More safety-critical systems are built from or hosted on commodity software, the vulnerabilities of which are widely publicized and well understood by attackers, then exposed on semi- or fully open networks (including the Internet). The increasing software intensiveness of safety-critical systems means more of their critical functions are performed by software than by hardware, and that software is necessarily larger and more complex, making its vulnerabilities harder to predict and detect.

As with safety hazards, the impact of software failures resulting from attacks and exploits depends on the nature of the targeted system. A threat to a safety-critical system can have the same dire consequences as a hazard. Even in non-safety-critical systems, the consequences of failure can be catastrophic: Insider sabotage of an intelligence database application may enable an attacker to steal the names of undercover operatives in an adversarial country and sell it to that country’s counterintelligence service, which then has them captured and executed. The subversion of software in a military logistics system that calculates the number of biochemical suits may result in a shortage of protection for forward-deployed forces during a chemical weapons attack.

Embedded, Not Isolated

Many safety-critical systems are embedded. Until recently, that meant they were small, relatively simple, and isolated from direct interaction with humans (they even lacked means for such interaction). Today’s embedded systems are different. They both benefit and become vulnerable from the increased power of the processors on which they are hosted. These are processors that enable the use of commodity operating systems, such as Microsoft CE, which share security problems with non-embedded operating systems sharing the same kernel code1.

The less proprietary and more connected embedded systems become, the less specialized expertise attackers need to target them. Systems from temperature controls to medical devices to on-board automobile computers and sensors are now accessible via wireless Radio Frequency Identification (RFID), cellular, and satellite links that use standard communications protocols. Implanted medical devices are increasingly accessible via RFID [2]. A DoD telemedicine application enables surgeons in U.S. military hospitals to issue commands, via a satellite uplink, to a software-controlled robot in Iraq, thereby performing laser surgery on wounded soldiers in theater [3, 4].

But where there is a wireless network, one can almost guarantee there will be an attacker attempting to locate, intercept, and tamper with the signals transmitted between the systems at either end of the wireless link. Consider telematic systems such as GM’s OnStar, Ford’s remote emergency satellite cellular unit and vehicle emergency messaging system, Volvo’s On Call, BMW’s Assist, and Mercedes-Benz’s Tele Aid and COMAND. They all use cellular or satellite connections to allow their call center representatives to perform remote diagnostics on the onboard computers of subscribers’ vehicles. Privacy concerns about certain data collected by these telematic services are well documented, but a recent addition to OnStar is even more worrying. Owners of 1.7 million OnStar-equipped 2009 GM vehicles can allow their engines to be “remotely switched off through the OnStar mobile communications system” [5] at the behest of the police. The goal is to stop stolen GM vehicles in their tracks during high-speed police car chases, thereby reducing the number of fatal accidents associated with such chases. The implications of OnStar’s transition from a passive monitoring and diagnostics system to an active controller of a safety-critical embedded system (the engine) have been noted:

[Some] automotive communication networks have access to crucial components of the vehicle, like brakes, airbags, and the engine control. Cars that are equipped with driving aid systems allow deep interventions in the driving behavior of the vehicle .... Malicious attackers are not to be underestimated. [6]

The next logical step – remote updates via telematic links to embedded software and firmware – would create an ideal conduit for insertion of malicious logic into embedded computers or causing denial-of-service by injecting “garbage bits” into telematic data streams [7].