Rapid, Tunable Error Detection with Execution Fingerprinting

Brett H. Meyer; Jonah Caplan; Georgi Z. Kostadinov; Mojing Liu

doi:10.4271/2013-01-2287

Recently, the combination of semiconductor manufacturing technology scaling and pressure to reduce semiconductor system costs and power consumption has resulted in the development of computer systems responsible for executing a mix of safety-critical and non-critical tasks. However, such systems are poorly utilized if lockstep execution forces all processor cores to execute the same task even when not executing safety-critical tasks. Execution fingerprinting has emerged as an alternative to n-modular redundancy for verifying redundant execution without requiring that all cores execute the same task or even execute redundant tasks concurrently. Fingerprinting takes a bit stream characterizing the execution of a task and compresses it into a single, fixed-width word or fingerprint.

Fingerprinting has several key advantages. First, it reduces redundancy-checking bandwidth by compressing changes to external state into a single, fixed-width word. Second, it reduces error detection latency by capturing and exposing intermediate operations on faulty data. Third, it naturally supports the design of mixed criticality systems by making dual-, triple-, and n-modular redundancy available without requiring significant architectural changes. Fourth, while it can't guarantee perfect error detection, error detection probabilities and latencies can be tuned to a particular application.

In this paper, we describe fingerprinting in safety-critical systems and explore the various trade-offs inherent in fingerprinting subsystem design, including: (a) determining what application data to compress, as a function of error detection probability and latency, and (b) identifying a corresponding fingerprinting circuit implementation. In this context, we present several case studies demonstrating how application characteristics inform fingerprinting subsystem design.