Overcoming the Performance and Security Challenges of Building Highly-Distributed Fault-Tolerant Embedded Systems
Loveless, Andrew
2023
Abstract
Over the past few decades, embedded systems, like those in spacecraft and aircraft, have evolved into complex distributed systems with hundreds of nodes and dozens of network switches. With this shift comes new challenges. One challenge is performance. Embedded systems are often required to mask faults. Unfortunately, traditional fault masking approaches, like state machine replication, require nodes to coordinate their actions by exchanging messages over several communication rounds. This means that in modern systems, where these messages often need to traverse multiple switch hops and must compete with hundreds or thousands of other traffic flows, traditional fault masking protocols can have high worst-case latencies that make it difficult or impossible to meet deadlines. For a variety of embedded systems, missing deadlines can be just as dangerous as generating incorrect outputs --- potentially even causing system failure. A second challenge is security. As embedded systems have grown, designers have looked for new ways to reduce size, weight, and power. One popular approach is to use mixed-criticality networks, which let systems share a single network between critical and non-critical devices. These networks are designed to ensure that non-critical devices, which often come from unsecured supply chains, have no way to disrupt the critical systems. However, the existence of shared network resources provides a potential means for attackers to bypass these isolation guarantees. To overcome the performance challenge, I introduce two new Byzantine fault-tolerant (BFT) state machine replication (SMR) protocols that exploit emerging hardware trends in embedded systems. The first, IGOR, exploits the increasing prevalence of multi-core processors. Rather than requiring nodes to agree on a single set of redundant sensor data to execute on, as in traditional protocols, IGOR lets nodes execute on multiple sets of redundant sensor data simultaneously on different cores. A coordination protocol is used in the background to determine which execution will determine the system's final state, reducing the system's latency to the time needed for either execution or coordination --- whichever takes longer. The second protocol, CrossTalk, exploits an increasingly popular network topology, in which messages travel simultaneously through redundant planes of switches. By using novel algorithms to move sensor data back and forth between the redundant planes, CrossTalk can ensure processing nodes maintain identical state without requiring any communication between the nodes, significantly reducing latency. Moreover, CrossTalk can be used on even modest single-core embedded processors. To illustrate the security challenge, I introduce PCspooF, a new cyberattack on a popular mixed-criticality network technology called Time-Triggered Ethernet (TTE). TTE is used in a variety of critical systems, including spacecraft, aircraft, and wind turbines. PCspooF shows that TTE's switch forwarding rules can allow a malicious non-critical device to infer secret information about the TTE network that can be used to construct fake TTE synchronization messages. By using a small amount of extra circuitry, the malicious device can inject the fake synchronization messages into the network, disrupting the operation of critical systems. Moreover, an attacker can exploit a flaw in the implementation of modern TTE devices to increase the rate of successful injections. PCspooF was disclosed to multiple impacted organizations in 2021, including several large spaceflight companies. The disclosure had significant real-world impacts, with multiple organizations acknowledging the attack and implementing defenses. PCspooF has also influenced changes to the standard for the TTE synchronization protocol (SAE AS6802).Deep Blue DOI
Subjects
Byzantine fault tolerance state machine replication speculative execution real-time systems distributed systems Time-Triggered Ethernet
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.