Probabilistic models of the development and use of fault tolerant software
Diversity, Reliability, Fault Tolerance, multiple-version software, failure independence, common-mode failure
The use of multiple, functionally equivalent, diverse software "versions" in a redundant configuration to achieve fault tolerance is well known . The usefulness of this technique depends on the probability of coincident failures among the diverse versions: this has been the focus of a wealth of research efforts . In particular, the probabilistic, conceptual models developed by Eckhardt et al  and Littlewood et al provided insight into the implications of diversity on the reliability of Fault Tolerant Software (FTS) systems. For instance, they point out that independently developed software versions (software developed by perfectly isolated development teams), used as components in a FTS system, may not fail independently. In some cases the average system reliability is worse than what would be expected under independence, while in others it could be better. This possible improvement is a consequence of the diversity in when and how the diverse component versions fail.
However, the assumption of independent development will seldom hold in practice. For instance, the development teams for different versions may have communicated and perhaps "propagated" mistakes from one team to another. Therefore, the teams may be considered to be dependent in how they produce their software versions. So, what are the implications, for the results of the aforementioned models, when we deviate from the conditions under which these models are based? What new insights can be learned from relaxing some of these conditions? We extended these models to cater for the possible effects, on system reliability, of diversity and dependence during the development and use of FTS systems.
Inspired by work carried out under the DISPO2  and DISPO3  projects we have developed models that not only extend the applicability but also confirm some results of the previous models under more general conditions.* These extensions also give new, useful results of practical importance. Some insights include:
An important aspect of this work is that many different kinds of "common influences" on the failures of diverse, redundant components are brought together in a unified probabilistic model.
 B. Littlewood and D. R. Miller, “Conceptual modelling of coincident failures in multi-version software,” IEEE Transactions on Software Engineering, vol. SE-15, pp. 1596–1614, 1989.
 D. E. Eckhardt and L. D. Lee, “A theoretical basis for the analysis of multiversion software subject to coincident errors,” IEEE Transactions on Software Engineering, vol. SE-11, pp. 1511–1517, 1985.
Kizito Salako, Lorenzo Strigini (City) kizito at csr dot city dot ac dot uk, l.strigini at csr dot city dot ac dot uk
|Page Maintainer: email@example.com||Credits||Project Members only||Last Modified: 10 August, 2005|