This content is not included in your SAE MOBILUS subscription, or you are not logged in.
Efficient Reliability and Safety Analysis for Mixed-Criticality Embedded Systems
ISSN: 0148-7191, e-ISSN: 2688-3627
Published April 12, 2011 by SAE International in United States
Annotation ability available
Due to the increasing integration of safety-critical functionalities into electronic devices, safety-related system design and certification have become a major challenge. Amongst others a suitable reaction of components in case of internal errors must be ensured in order to prevent a function from failing and to guarantee a certain degree of reliability. In this context a wide variety of different fault tolerance mechanisms have been developed in the past, including analytical considerations of error coverage and resulting reliability. However, most of these mechanisms induce a certain timing overhead, which in turn might affect the real-time capabilities of the system in a negative way. More concretely, even if each error is treated adequately such that no logical failure occurs, a timing failure due to missing a deadline cannot be ruled out definitely. Thus, there is a growing need for appropriate methods to calculate the probability of timing failures and to prove that potential reliability and safety constraints are not violated.
In this paper we present an analysis approach for networked systems as well as highly integrated multi-core architectures to calculate reliability with respect to timing failures. For that purpose simulation techniques are less appropriate and expensive due to the rare fault events, leading to exhaustive simulation times until results are statistically relevant. Therefore, formal methods have been developed to prove that the considered embedded real-time system is working correctly and that failure rates are bounded according to the required safety level. Further on we present an extension of the basic analysis ideas to include the influence of different error models into reliability analysis. Special emphasis is put on mixed-criticality systems, i.e. systems with applications of different safety requirements. We propose an approach to decouple the reliability analyses for these applications and to determine an individual safety integrity level for each application. Based on this approach it is possible to refine the conservative concept of IEC 61508 to take the most critical application as a basis for the whole system, enabling cost reduction and automated qualification. Based on a prototype implementation for Symtavision's SymTA/S tool suite we will show how the presented methodologies can be integrated into a safety related design flow. Based on that kind of tooling support the presented approaches can be applied for different stages of the design process, such as design space exploration and optimization as well as for verification and certification purposes.
CitationSebastian, M., Axer, P., Ernst, R., Feiertag, N. et al., "Efficient Reliability and Safety Analysis for Mixed-Criticality Embedded Systems," SAE Technical Paper 2011-01-0445, 2011, https://doi.org/10.4271/2011-01-0445.
- Borkar, S.. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. In IEEE Micro, vol. 25, no.6, pages 10-16, 2005.
- Burns, A., Punnekkat, S., Strigini, L., and Wright, D.R.. Probabilistic scheduling guarantees for fault-tolerant real-time systems. In Dependable Computing for Critical Applications 7, pages 361-378, 1999.
- Elliott, E.. Estimates of error rates for codes on burst-noise channels. In Bell System Technical Journal, vol. 42, no. 9, pages 1977-1997, 1963.
- Ferreira, J., Oliveira, A., Fonseca, P., and Fonseca, J.. An experiment to assess bit error rate in CAN. In Proceedings of 3rd International Workshop of Real-Time Networks, 2004.
- Garcia-Frias, J. and Crespo, P.. Hidden Markov models for burst error characterization in indoor radio channels. In IEEE Transactions on Vehicular Technology, vol. 46, no. 4, pages 1006 - 1020, 1997.
- Kopetz, H.. Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, 1997.
- Mukherjee, S. S., Weaver, C., Emer, J., Reinhardt, S. K., Austin, T.: A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of 36th Annual International Symposium on Microarchitecture, 2003.
- Navet, N., Song, Y.-Q., and Simonot, F.. Worst-case deadline failure probability in real-time applications distributed over controller area network. In Journal of Systems Architecture, vol. 46, no. 7, pages 607-617, 2000.
- Rabiner, L. and Juang, B.. A tutorial on hidden Markov models. In Proceedings of the IEEE, vol. 77, no. 2, pages 257-286, 1989.
- Rodriguez-Navas, G. and Proenza, J.. Clock Synchronization in CAN Distributed Embedded Systems. In Proceedings of 3rd International Workshop on Real-Time Networks, 2004.
- Sebastian, M. and Ernst, R.. Modelling and designing reliable on-chip communication devices in MPSoCs with real-time requirements. In Proceedings of 13th IEEE International Conference on Emerging Technologies and Factory Automation, 2008.
- Sebastian, M. and Ernst, R.. Reliability analysis of single bus communication with real-time requirements. In Proceedings of the 15th Pacific Rim International Symposium on Dependable Computing, 2009.
- Shooman, M. L.. Reliability of Computer Systems and Networks Fault Tolerance, Analysis, and Design. John Wiley & Sons, 2002.
- Smolens, J. C., Gold, B. T., Kim, J., Falsafi, B., Hoe, J.C., and Nowatzyk, A. G.. Fingerprinting: bounding soft-error detection latency and bandwidth. In Proceedings of the 11th international Conference on Architectural Support for Programming Languages and Operating Systems, 2004.
- Symtavision - Scheduling Analysis for ECUs, Buses and Networks, September 2010. http://www.symtavision.com.
- Tindell, K.W., Hansson, H. and Wellings, A.J.. Analysing real-time communications: controller area network (CAN). In Proceedings of the 15th IEEE Real-Time Systems Symposium, 1994.
- Wolf, K. and Blakeney, R. D.. An exact evaluation of the probability of undetected error for certain shortened binary CRC codes. In Military Communications Conference, 1988.