http://www.dirc.org.uk/  
 
 
   
Overview
Research
 

   Themes  
   Results

Sites
People
Publications
Events
Related Projects
   
 

Full title

Probabilistic models of the development and use of fault tolerant software

Keywords

Diversity, Reliability, Fault Tolerance, multiple-version software, failure independence, common-mode failure

Summary

The use of multiple, functionally equivalent, diverse software "versions" in a redundant configuration to achieve fault tolerance is well known [4]. The usefulness of this technique depends on the probability of coincident failures among the diverse versions: this has been the focus of a wealth of research efforts [3]. In particular, the probabilistic, conceptual models developed by Eckhardt et al [2] and Littlewood et al[1] provided insight into the implications of diversity on the reliability of Fault Tolerant Software (FTS) systems. For instance, they point out that independently developed software versions (software developed by perfectly isolated development teams), used as components in a FTS system, may not fail independently. In some cases the average system reliability is worse than what would be expected under independence, while in others it could be better. This possible improvement is a consequence of the diversity in when and how the diverse component versions fail.

However, the assumption of independent development will seldom hold in practice. For instance, the development teams for different versions may have communicated and perhaps "propagated" mistakes from one team to another. Therefore, the teams may be considered to be dependent in how they produce their software versions. So, what are the implications, for the results of the aforementioned models, when we deviate from the conditions under which these models are based? What new insights can be learned from relaxing some of these conditions? We extended these models to cater for the possible effects, on system reliability, of diversity and dependence during the development and use of FTS systems.

Results

Inspired by work carried out under the DISPO2 [5] and DISPO3 [5] projects we have developed models that not only extend the applicability but also confirm some results of the previous models under more general conditions.* These extensions also give new, useful results of practical importance. Some insights include:

clarification of the extent to which the previous models are applicable;

confirmation of, and counter examples against, commonly used arguments about the effects of interaction and communication between development teams, with implications about how to manage the development or acquisition of diverse systems;

enumeration and analysis of many forms of dependence, all of which can be captured by the same probabilistic modelling approach;

the identification of scenarios where dependence, in a FTS system’s development process, will always result in worse average system reliability;

the identification of scenarios where the use of forced diversity is always a good thing. That is, the use of forced diversity has no potential for making the average system reliability worse than it otherwise would be;

the introduction of graphical models (Bayesian Belief Networks) in summarising, simplifying and analysing, possibly complex, forms of dependence and how they affect system reliability;

An important aspect of this work is that many different kinds of "common influences" on the failures of diverse, redundant components are brought together in a unified probabilistic model.

References

[1] B. Littlewood and D. R. Miller, “Conceptual modelling of coincident failures in multi-version software,” IEEE Transactions on Software Engineering, vol. SE-15, pp. 1596–1614, 1989.

[2] D. E. Eckhardt and L. D. Lee, “A theoretical basis for the analysis of multiversion software subject to coincident errors,” IEEE Transactions on Software Engineering, vol. SE-11, pp. 1511–1517, 1985.

[3] B.Littlewood, P. Popov and L. Strigini, ``Modelling software design diversity: a review ,'' ACM Computing Surveys, vol. 33, Issue 2, pages 177-208, 2001.

[4] Edited by Hassan B. Diab, Albert Y. Zomaya , ``Dependable Computing Systems: Paradigms, Performance Issues, and Applications,''ISBN: 0-471-67422-2, Wiley, 2005.

[5] DISPO2 and DISPO3 project pages

links

 

Papers

 

B.Littlewood, P. Popov and L. Strigini, ``Modelling software design diversity: a review ,'' ACM Computing Surveys, vol. 33, Issue 2, pages 177-208, 2001.

 

Author

Kizito Salako, Lorenzo Strigini (City) kizito at csr dot city dot ac dot uk, l.strigini at csr dot city dot ac dot uk

 

 
Page Maintainer: webmaster@dirc.org.uk Credits      Project Members only Last Modified: 10 August, 2005