The Community for Technology Leaders
Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing (1994)
Austin, TX, USA
June 15, 1994 to June 17, 1994
ISBN: 0-8186-5520-8
pp: 16-25
D. Mosse , Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
R. Melhem , Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
S. Ghosh , Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
ABSTRACT
Fault tolerance is an important aspect of real-time computer systems, since timing constraints must not be violated. When dealing with multiprocessor systems, fault tolerance becomes an even greater requirement, since there are more components that can fail. We present the analysis of a fault tolerant scheduling algorithm for real-time applications on multiprocessors. Our algorithm is based on the principles of primary/backup tasks, backup overloading (i.e., scheduling more than a single backup in the same time interval), and backup deallocation (i.e., reclaiming the resources unused by backup tasks in case of fault-free operation). A theoretical model is developed to study a particular class of applications and certain backup and overloading strategies. The proposed scheme can tolerate a single fault of any processor at any time, be it transient or permanent. Simulation results offer evidence of little loss of schedulability due to the addition of the fault tolerance capability. Simulation is also used to study the length of time needed for the system to recover from a fault (i.e., the time when the system is again able to tolerate any fault).<>
INDEX TERMS
multiprocessing systems, fault tolerant computing, scheduling, multiprocessing programs, software reliability, real-time systems
CITATION

D. Mosse, R. Melhem and S. Ghosh, "Analysis of a fault-tolerant multiprocessor scheduling algorithm," Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing(FTCS), Austin, TX, USA, 1994, pp. 16-25.
doi:10.1109/FTCS.1994.315661
95 ms
(Ver 3.3 (11022016))