Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing (1994)
Austin, TX, USA
June 15, 1994 to June 17, 1994
I-Ling Yen , Dept. of Comput. Sci., Michigan State Univ., East Lansing, MI, USA
Cooperating parallel programs are being increasingly used in critical applications that require both high performance and high reliability. A promising technique for simultaneously achieving these objectives is to embed the fault tolerance within the program instead of superimposing it via external mechanisms. We develop one such approach for a group of processes that cooperate via shared data structures. The scheme uses data structures having two or more invariant assertions. When the strong invariant is true, the performance is good. When it is false, the performance may be adversely affected, but it is guaranteed that the system will operate correctly provided the weak invariant is true. The algorithms are designed to ensure that processor failures will never cause the weak invariant to be false and to restore the strong invariant within a finite number of recovery actions. We develop a robust task handling mechanism to support the approach and illustrate it for three common data structures.<
data structures, parallel programming, fault tolerant computing, software reliability
I-Ling Yen and F. Bastani, "Systematic incorporation of efficient fault tolerance in systems of cooperating parallel programs," Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing(FTCS), Austin, TX, USA, 1994, pp. 154-163.