Algorithm-Based Fault Tolerance for Numerical Subroutines
TBMG-2421
11/1/2007
- Content
A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numerical- subroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user.
- Citation
- "Algorithm-Based Fault Tolerance for Numerical Subroutines," Mobility Engineering, November 1, 2007.