Citation:
Abstract:
Hardware fault tolerance is an important consideration in critical distributed real-time embedded systems and has been extensively researched. In these systems, critical real-time constraints must be satisfied even in the presence of hardware component failures. Our goal is to propose a solution to automatically produce a fault-tolerant distributed schedule of a given algorithm onto a given distributed architecture, according to real-time constraints. The distributed architectures we consider have bidirectional point-to-point communication links. Our solution is a list scheduling heuristics, based on disjoint paths to tolerate a fixed number of arbitrary processor and communication link failures. Because of the resource limitation in embedded systems, our heuristics implements a software solution based on the active replication technique, where each operation of the algorithm is replicated on different processors. With a detailed example, we show the techniques used to satisfy the real-time constraints and tolerate the failure of processor and communication links. Simulations show the efficiency of our method compared with other heuristics found in the literature.