00530219.pdf

A Novel Algorithm for Freeing Network from Points of
Failure
Rahul Gupta and Suneeta Agarwal
Department of Computer Science and Engineering, Motilal Nehru National Institute of
Technology, Allahabad, India
rahulgupta_mnnit@yahoo.co.in, suneeta@mnnit.ac.in
Abstract. A network design may have many points of failure, the failure of any of which
breaks up the network into two or more parts, thereby disrupting the communication between
the nodes. This paper presents a heuristic for making an existing network more reliable by
adding new communication links between certain nodes. The algorithm ensures the absence of
any point of failure after addition of addition of minimal number of communication links determined by the algorithm. The paper further presents theoretical proofs and results which
prove the minimality of the number of new links added in the network.
Keywords: Points of Failure, Network Management, Safe Network Component, Connected
Network, Reliable Network.
1 Introduction
A network consists of number of interconnected nodes communicating among each
other through communication channels between them. A wired communication link
between two nodes is more reliable [1]. Various topology designs have been proposed
for various network protocols and applications [2][3] such as bus topology, star topology, ring topology and mesh topology. All these network designs leave certain nodes
as failure points [4][5][6]. These nodes become very important and must remain
working all the time. If one of these nodes is down for any reason, it breaks the network into segments and the communication among the nodes in different segments is
disrupted. Hence these nodes make the network unreliable.
In this paper, we have designed a heuristic which has the capability to handle a
single failure of node. The algorithm adds minimal number of new communication
links between the nodes so that a single node failure does not disrupt communication
among communicating nodes.
2 Basic Outline
The various network designs common in use are ring topology, star topology, mesh
topology, bus topology [1][4][5]. All these topology designs have their own advantages and disadvantages. Ring topology does not contain any points of failure. Bus
topology on the other hand, has many points of failure. Star topology contains a single
E. Corchado et al. (Eds.): CISIS 2008, ASC 53, pp. 219–226, 2009.
© Springer-Verlag Berlin Heidelberg 2009
springerlink.com
220
R. Gupta and S. Agarwal
point of failure, the failure of which disrupts communication between any pair of
communicating nodes.
Points marked P in figure 2 are the points of failure in the network design. Star topology and bus topology are least reliable from the point of view of failure of a single
communication node in the network. In star topology, there is always one point of
failure, the failure of which breaks the communication between all pairs of nodes and
no nodes can communicate further. In a network of n nodes connected by bus topology, there are (n-2) points of failure. Ring topology is most advantageous and has no
points of failure.
For a reliable network, there must be no point of failure in the network design.
These points of failure can be made safe by adding new communication links between
nodes in the network. In this paper, we have presented an algorithm which finds the
points of failure in a given network design. The paper further presents a heuristic
which adds minimal number of communication links in the network to make it reliable. This ensures the removal of points of failure with least possible cost.
3 Algorithm for Making Network Reliable
In this paper, we have designed an algorithm to find the points of failure in the network and an algorithm for converting these failure points into non failure points by
the addition of minimal number of communication links.
3.1 New Terms Coined
We have coined the following terms which aid in the algorithm development and
network design understanding.
N – Nodes of the Network
E – Links in the Network
P – Set of Points of Failure
S – Set of Safe Network Components
Pi – Point of Failure
Si – Safe Network Component
Si(a) – Safe Network Component Attached to the Failure Point ‘a’
B – Set of all Safe Network Components each having a Single Point of Failure in the
Original Network
Bi – A Safe Network Component having a Single Point of Failure in the Original
Network
|B| - Cardinality of Set B
Fi – Point of Failure Corresponding to the Original Network in the member ‘Bi’
NFi – Non Failure Point corresponding to the Original Network in the member ‘Bi’
L – Set of New Communication Links Added
Li – A New Communication Link
C – Matrix List for the Components Reachable
dfn(i) – Depth First Number of the node ‘i’
low(i) – Lowest Depth First Number of the Node Reachable from ‘i’.
A Novel Algorithm for Freeing Network from Points of Failure
221
The points of failure are the nodes in the network, the failure of any of which
breaks the network into isolated segments which can not have any communication
among each other. A safe network component is the maximal subset of the connected
nodes from the complete network which do not contain any point of failure. The safe
component can handle a single failure occurring at any of its node within the subset.
We have developed an algorithm which finds the minimal number of communication
links to be added to the network to make the network capable of handling a single
failure of any node. A safe component may have more than one point of failure in
the original network. The algorithm considers the components having only a single
point of failure differently. ‘B’ is the set of all safe components having only a single
point of failure in the original network. ‘Fi’ is the point of failure in the original network design. ‘C’ corresponds to the matrix having the reachable components. Each
Row in the matrix corresponds to the components reachable through one outgoing
link from the point of failure. All the components having single point of failure and
occurring on one outgoing link corresponds to the representatives in each row. The
new communication links added in the network are collected in the set ‘L’. The set
contains the pairs of nodes between which links must be added to make network free
of points of failure.
Fig. 1. (a) An example network design, (b) Safe components in the design
3.2 Algorithm for Finding Points of Failure and Safe Components
To find all the points of failure in the network, we use depth first search [7][8] technique starting from any node in the network. Nodes that are reachable through more
than one path become part of the safe component and the ones which are connected
through only one path are vulnerable and the communication can get disrupted because of any one node in the single path of communication available for the node. The
network is represented by a matrix of nodes connected to each other with edges
representing the communication links. Each node of the network is numbered sequentially in the depth first search order starting from 0. This forms the dfn of each node.
The unmarked nodes reachable from a node are called the child of each node and the
node itself becomes the parent of those child nodes. The algorithm finds the low of
each node and the points of failure in the network design and all the safe components
in the network. The algorithm finds the safe components and all points of failure in
the network. The starting node is a pint of failure if some unmarked nodes remain
even on fully exploring any one single path from the node.
222
R. Gupta and S. Agarwal
3.3 Algorithm for Finding Points of Failure and Safe Components
In this section, we describe our algorithm for the conversion of points of failure into
non failure points by the addition of new communication links. The algorithm adds
minimal number of new links which ensures least possible cost to make the network
reliable. The algorithm is based on the concept that the safe components having more
than one point of failure are necessarily connected to a safe component having only
one point of failure directly or indirectly. Thus this component can become a part of
larger safe component through more than one path which originates from any of the
points of failure in the original network present in the component. Thus if the component having only one point of failure in the original network is made a part of larger
safe component, the component having more than one point of failure is made safe
itself. The algorithm finds new links to be added for making the safe component larger and larger and thus finally including all the nodes of the network making the complete network safe. When the maximal component that is safe consists of all the nodes
of the network, the whole network is made safe and all points of failure are removed.
The following steps are followed in order.
1. Initially the set L = ∅ is taken.
2. P, the set of points of failure is found using algorithm described in section 3.2. The
algorithm also finds all safe components of the network and adds them to the set S.
Each of the Si has a copy of failure point within it. Hence, the failure points are
replicated in each component.
A Novel Algorithm for Freeing Network from Points of Failure
223
3. Find the subset B of safe components having only single point of failure in the
original network by using set S and set P found in step 2. Let each of these component members be B1, B2, B3,…. , Bk. These Bi`s are mutually disjoint with respect
to non failure points.
4. Each of the components Bi has at least one non failure point. Any non failure point
node is named as NFi and taken as the representative of the component Bi.
5. The failure point present in maximum number of safe components is chosen i.e, the
node, the failure of which creates maximum number of safe components is chosen.
Let it be named ‘s’.
6. Let S1(s), S2(s), S3(s),…. Sm(s) be all the safe components having the failure
point ‘s’. Each of these components may have one or more points of failure corresponding to the original network. If the component has more than one point of failure, other safe components are reachable from these safe components through
points of failure other than ‘s’.
7. Now we create the lists of components reachable from point of failure‘s’. For each
Sj(s), j=1, 2,… m, if the component contains only one point of failure, add the representative of this component to the list as the next row element and if the component contains more than one point of failure, then the reachable safe components
having only one point of failure are taken and their representatives are added to the
list C. These components are found by going using depth search from this component. All the components that are reachable from the same component are considered for the same row and their corresponding representatives are added in the
same row in the matrix C. The number of elements in each row of matrix C corresponds to the number of components that are reachable from the point of failure ‘s’
through that one outgoing link. It is to be noted here that the components having
one point of failure only are considered for the algorithm. Now we have a row for
each Sj(s), j=1, 2,… m. Thus the number of rows in matrix C is m.
8. The number of elements in each row of matrix C corresponds to the number of
components that are reachable from the point of failure ‘s’ through that one outgoing link. It is to be noted that each component is represented just by a non failure
point representative. Arrange the matrix rows in non decreasing order based on the
size of the row i.e, on the basis of the number of elements in each row.
9. If all Ci(s) `s are of size 1, pair the only member of each row with the only member
of next row. Here pairing means adding a communication link between the non
failure point members acting as representatives of their corresponding components.
Thus giving (k-1) new communication links to be added to the network for ‘k’
members. Add all these edges to set L, the set of all new communication links and
exit from the algorithm. If the size of some Ci(s) `s is greater than 1, start with the
last list Ci ( the list of the maximum size). For every k>=2, pair the kth element of
this row with the (k-1) th element of the preceding row (if it exists). Here again
pairing means addition of a communication link between the representative nodes.
Remove these paired up elements from the lists and the lists are contracted.
10.Now if more than one element is left in the second last list, shift the last element
from this list to the last list and append to the last list.
11.If the number of non empty lists is greater than one, go to step 8 for further
processing. If the size of the last and the only left row is one, pair its only member
with any of the non failure points in the network and exit from the algorithm. If the
224
R. Gupta and S. Agarwal
last and the only row left have only two elements left in it, then pair the two representatives and exit from the algorithm. If the size of the last and the only left row is
greater than two, add the edges from set L into the network design and repeat the
algorithm from step 2 on updated network design.
Since in every iteration of the algorithm at least one communication link is added to
set L and only finite number of edges are added, the algorithm will terminate in finite
number of steps. The algorithm ensures that there are at least two paths between any
pair of nodes in the network. Thus, because of multiple paths of communication between any pair of nodes, the failure of any one of the node does not effect the communication between any other pair of nodes in the network. Thus the algorithm makes
the points of failure in the original network safe by adding minimal number of communication links.
4 Theoretical Results and Proofs
In this section, we describe the theoretical proofs for the correctness of the algorithm
and sufficiency of the number of the new communication links added. Further, the
lower and upper bounds on the number of links added to the network are proved.
Theorem 1. If | B | = k, i.e., there are only k safe network components having only
one point of failure in the original network, then the number of new edges necessary
to make all points of failure safe varies between ⎡k/2⎤ and (k-1) both inclusive.
Proof: Each safe component Bi has only point of failure corresponding to the original
network. Failure of this node will separate the whole component Bi from remaining
part of the network. Thus, for having communication from any node of this component Bi with any other node outside of Bi, at least one extra communication link is
required to be added with this component. This argument is valid for each Bi. Thus at
least one extra edge is to be added from each of the component Bi. This needs at least
⎡k/2⎤ extra links to be added each being incident on a distinct pair of Bi’s. This forms
the lower bound on the number of links to be added to make the points of failure safe
in the network design.
Fig. 2. (a) and (b) Two Sample Network Designs
In figure 2(a), there are k = 6 safe components each having only one point of failure
and thus requiring k/2 = 3 new links to be added to make all the points of failure safe.
It is easy to see that k/2 = 3 new links are sufficient to make the network failure free.
A Novel Algorithm for Freeing Network from Points of Failure
225
Now, we consider the upper bound on the number of new communication links to
be added to the network. This occurs when | B | = | S | = k, i.e, when each safe components in the network contain only one point of failure. Since, there is no safe
component which can become safe through more than one path. Thus all the safe
components are to be considered by the algorithm. Thus, it requires the addition of (k1) new communication links to join ‘k’ safe components.
Theorem 2. If the edges determined by the algorithm are added to the network, the
nodes will keep on communicating even after the failure of any single node in the
network.
Proof: We arbitrarily take 2 nodes ‘x’ and ‘y’ from the set ‘N’ of the network. Now
we show that ‘x’ and ‘y’ can communicate even after the failure of any single node
from the network.
CASE 1: If the node that fails is not a point of failure, ‘x’ and ‘y’ can continue to
communicate with each other.
CASE 2: If the node that fails is a point of failure and both ‘x’ and ‘y’ are in the
same safe component of the network, then by the definition of safe component ‘x’ and
‘y’ can still communicate because the failure of this node has no effect on the nodes
that are in the same safe component.
CASE 3: If the node that fails is a point of failure and ‘x’ and ‘y’ are in different
safe components and ‘x’ and ‘y’ both are members of safe components in set ‘B’. We
know that the algorithm makes all members of set ‘B’ safe by using only non failure
points of each component so the failure of any point of failure will not effect the
communication of any node member of the safe component formed. This is because
the algorithm has already created an alternate path for each of the node in any of the
safe member.
CASE 4: If the node that fails is a point of failure and ‘x’ and ‘y’ are in different
safe components and ‘x’ is a member of component belonging to set ‘B’ and ‘y’ a
member of component belonging to set ‘(S-B)’. Now we know that any node occurring in any member of set ‘(S-B)’ is connected to at least 2 points of failure in the safe
component and through each of these points of failure we can reach to a member of
set ‘B’. So even after deletion of any point of failure, ‘y’ will remain connected with
at least one member of set B. The algorithm has already connected all the members of
set ‘B’ by finding new communication links, hence ‘x’ and ‘y’ can still communicate
with each other.
CASE 5: If the node that fails is a point of failure and ‘x’ and ‘y’ are in different
safe components and both ‘x’ and ‘y’ belong to components that are members of set
‘(S-B)’. Now each member of set ‘(S-B)’ has at least 2 points of failure. So after the
failure of any one of the failure point, ‘x’ can send message to at least one component
that is a member of set ‘B’. Similarly, ‘y’ can send message to at least one component
that is a member of set ‘B’. Now, the algorithm has already connected all the components belonging to set ‘B’, so ‘x’ and ‘y’ can continue to communicate with each
other after the failure of any one node.
After the addition of links determined by the algorithm, there exist multiple paths of
communication between any pair of communicating nodes. Thus, no node is dependent on just one path.
226
R. Gupta and S. Agarwal
Theorem 3. The algorithm provides the minimal number of new communication links
to be added to the network to make it capable of handling any single failure.
Proof: The algorithm considers only the components having a single point of failure
corresponding to the original network. Since | B | = k, thus it requires at least ⎡k/2⎤
new communication links to be added to pair up these k components and making
them capable of handling single failure of any node in the network. Thus adding less
than ⎡k/2⎤ new communication links can never result in safe network. Thus the algorithm finds minimal number of new communication links as shown by the example
discussed in theorem 1. In all the steps of the algorithm, except the last, only one link
is added to join 2 members of set ‘B’ and these members are not further considered
for the algorithm and hence do not generate any further edge in set ‘L’. In the last
step, when only one vertical column of x rows with each row having single member is
left, then (x-1) new links are added. These members have the property that only single
point of failure ‘s’ can separate these into x disjoint groups, hence addition of (x-1)
links is justified. When only single row of just one element is left, this can only be
made safe by joining it with any one of the non failure nodes. Hence, the algorithm
adds minimal number of new communication links to make the network.
5 Conclusion and Future Research
This paper described an algorithm for making points of failure safe in the network.
The new communication links determined by the algorithm are minimal and guarantees to make the network capable of handling a single failure of any node. The algorithm guarantees at least two paths of communication between any pair of nodes in
the network.
References
1. Tanenbaum, A.S.: Computer Networks, 4th edn. Pearson Education, London (2004)
2. Pearlman, R.: Interconnections: Bridges, Routers, Switches, and Internetworking Protocols,
2nd edn. Pearson Education, London (2006)
3. Kamiyana, N.: Network Topology Design Using Data Envelopment Analysis. In: IEEE
Global Telecommunications Conference (2007)
4. Dengiz, B., Altiparmak, F., Smith, A.E.: Efficient optimization of all-terminal reliable networks, using an evolutionary approach. IEEE Transactions on Reliability 46(1), 18–26
(1997)
5. Mandal, S., Saha, D., Mukherjee, R., Roy, A.: An efficient algorithm for designing optimal
backbone topology for a communication network. In: International Conference on Communication Technology, vol. 1, pp. 103–106 (2003)
6. Ray, G.A., Dunsmore, J.J.: Reliability of network topologies. In: IEEE INFOCOM 1988
Networks, pp. 842–850 (1988)
7. Horowitz, E., Sahni, S., Anderson-Freed, S.: Fundamentals of Data Structures in C, 8th edn.
Computer Science Press (1998)
8. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn.
Prentice-Hall, India (2004)