Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing (1994)
Austin, TX, USA
June 15, 1994 to June 17, 1994
Chung-Ho Chen , Dept. of Electr. Eng., Nat. Yunlin Inst. of Technol., Taiwan
We propose an error detection and recovery protocol for redundant processor systems employing caches. The protocol allows cache-based systems to vote more often and thereby reduce the chance of losing synchronization. The scheme is based on cache data broadcasting of a dirty line after modification. The scheme effectively exploits the redundancy of a fault-tolerant system using hardware voting. It recovers from erroneous data written by a processor and this remedies the insufficiency of error-correcting codes. The protocol can also be used to speed-up resynchronization process for a temporarily failed processor in a redundant system. More than 60% of cache lines are fully covered for recovery due to errors originated from the cache itself, including unrecoverable ECC errors. The performance overhead is to broadcast only 2-3% of the total memory references.<
error detection, error correction, synchronisation, buffer storage, protocols, storage management, fault tolerant computing, redundancy
Chung-Ho Chen and A. Somani, "A cache protocol for error detection and recovery in fault-tolerant computing systems," Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing(FTCS), Austin, TX, USA, 1994, pp. 278-287.