Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing (1994)
Austin, TX, USA
June 15, 1994 to June 17, 1994
T.R. Sarnaik , Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA
We develop a technique to reduce the time to perform online tests and diagnose faulty units and reduce the recovery time when a fault occurs in a system. A clue is given to the tester about possible faulty locations. Thus only a fraction of the resources within a system needs to be tested. This is accomplished by keeping track of the resources used by an application program when it executes. We demonstrate that a significant reduction in test time can be achieved, in particular for cache and memory subsystems. This technique can improve response time and meet more deadlines in soft real-time systems when the system employs online tests and recovery schemes. We develop this technique further and support our analysis using trace-driven simulation. We discuss ways to implement the resource utilization vector (RUV) scheme in a system, and show how the RUV scheme is used to improve the forward error recovery process.<
system monitoring, fault location, system recovery, real-time systems, vectors, resource allocation, buffer storage, online operation, computer testing, fault tolerant computing
T. Sarnaik and A. Somani, "Effects of resource utilization monitoring in fault recovery," Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing(FTCS), Austin, TX, USA, 1994, pp. 6-15.