The most powerful tool for modeling redundancy is the reliability block diagram. In this paragraph we use the 1-out-of-2 configuration in order to learn how MTBF can be calculated for redundant systems
Redundancy is a special form of
failure tolerance. Redundancy is basically operating two (or more) same
units in parallel, while at least only one (or only n-k) unit is needed
for successful operation Redundancy is the most powerful, but also the
most expensive means, when system failure rate and/or downtime needs to
be as small as possible. With redundancy you can improve every
reliability metric by orders of magnitude. The rerason for this is that
it is rather unlikely that both redundant units would fail at the same
time. In particular, it is unlikely that the second unit would fail
during the repair of the first unit, since repair times are usually
very short compared to MTBF.
We assume the constant failure rate case, not because it is simple, but
rather for the following reasons:
The constant failure rate case is
actually a strong achievement every supplier and manufacturer should
strive to
It turns out that despite
constant failure rate of branches, the system failure rate will not be
constant any more
Suppose a hypothetical system
consisting only of two identical redundant branches. There shall be no
further
non-redundant elements, therefore this system is thoroughly
redundant. Only one branch shall be needed in order to keep the system
operational.. The failure rate of the branches shall be the same
regardles of the state of the system, in particular, it shall make no
difference for reliability and failure rate, whether only one or both
branches are in operational state.
This hypothetical system appears to be a bit idealized, however, this
does not affect the whole argument.
At first we want to keep it simpe. We want to know how long the system
would be oparational without applying repair. This means we would
operate the system and just wait until both branches would fail. So we're interested in the
failure rate vs. time for both
branches. Remember from further above that reliability R(t) is the reliability
function, or the probability of survival / success, therefore1-R(t)
must be the probability of failure:
Also remember from above that the following applies for the constant
failure rate case :
we finally obtain the failure rate
for a redundant system with two identical branches:
Unfortunately this expression cannot
be simplified. Below is the failure rate vs. time.
Note that the system failure rate is actually zero at t=0. Here is an
easy approach to understand this: While a single branch can
actually fail at t=0 (this is just the branch failure rate), it
is extremely unlikely that both branches would fail exactly at the same
time point. The mathematical probability for both failures happening
exactly at the same time point is zero. If we want the system to fail
at t=0, there is no other way than both branches failing at
the same time point, at =0 to be exact. For t>0, the longer we wait,
the more likely it becomes that we see both branches failing. For t--> 00 the system failure
rate asymptotically tends to the branch failure rate. Here is an easy
approach to understand this is: The more time passes, the more likely
it becomes that one branch would fail. After the first failure, with
only one branch functioning, the system would have a constant failure
rate, namely the branch failure rate.
Zero failure rate can also be seen on
the reliability graph: R(t) is constant for small t:
Now, instead of waiting until the
system fails, let's begin with repair as soon as one of the branches
fails. Let's further assume that the system is designed in a manner that
allows us to repair the faild branch, while the system continues to operate with the other branch.This would improve the system
reliability dramatically, since system failures could then occur only during repair. The probability that the
system fails during a repair is simply
(MTBF of two identical branches = 1/2 x MTBF of a single branch)
The mean time between system failure,
MTBFSystem, is thus given by
This scenario is one of the rare
cases where it's easier tu use MTBF for calculation instead of failure
rate. 1/2 x MTBF_{Branch} is the MTBF for any of the two branches
failing. 1/P1 is the multiple of (1/2 x MTBF_{Branch} ) it takes on average between two system
failures.