Sparse Partitions - The Department of Computer Science

Sparse Partitions
(mini project report)
Iddo Rachlewski
037218849
Eli Bar-lev
031470362
1. Introduction
Our mini project is based on the article "Sparse Partitions" by Baruch
Awerbuch & David Peleg in [ABPL90].
We decided to focus our research on the sparse coarsening covers algorithm
for construction of a cover with low maximum degree.
2. The Algorithm (Based on the description in [ABPL90])
"Given a graph G=(V,E), |V|=n, a cover S and an integer k≥1, the algorithm
constructs a coarsening cover T that satisfies the following properties:
a. Rad(T)≤(2k-1)Rad(S)
b. ∆(T)≤2k|S|1/k
The algorithm which Peleg and Awerbuch propose is called MAX_COVER
The coarsening problem is handled by reducing it to the subproblem of
constructing a partial cover. The input of this problem is a graph G=(V,E),
|V|=n, a collection of (possibly overlapping) clusters R and an integer k≥1.
The output consists of a collection of disjoint clusters, DT, that subsume a
subset DRR of the original clusters.
The goal is to subsume "many" clusters of R while maintaining the radii of the
output clusters in DT relatively small.
The algorithm to achieve that goal is called Cover(R(.
Procedure Cover(R) starts by setting U, the collection of unprocessed clusters
to equal R. We arbitrarily chose to construct R in the following manner:
For each vV , we created a cluster r=(V', E'), V'={u|uV, (v,u)E},
E'={e(v,u)|vV', uV'}
The procedure operates in iterations. Each iteration constructs one output
cluster YDT, by merging together some clusters of U. The iteration begins by
arbitrarily picking a cluster S in U and designating it as a kernel of a cluster to
be constructed next. The cluster is then repeatedly merged with intersecting
clusters from U. This is done in a layered fashion, adding one layer at a time.
-1-
At each stage, the original cluster is viewed as the internal kernel Y of the
resulting cluster Z . The merging process is carried repeatedly until reaching a
certain sparsity condition (specifically, until the next iteration increases the
number of clusters merged into Z by a factor of less than |R|1/k).
The procedure then adds the kernel Y of the resulting cluster Z to a collection
DT. It is important to note that the newly formed cluster consist of only the
kernel Y, and not the entire cluster Z, which contains an additional "external
layer" of R clusters. The role of this external layer is to act as a "protective
barrier" shielding the generated cluster Y and providing the desired
disjointness between the different clusters Y added to DT.
Throughout the process, the procedure keeps also the "unmerged" collections
y, z, containing the original R clusters merged into Y, Z.
At the end of the iterative process when Y is complete, every cluster in the
collection y is added to DR, and every cluster in the collection z is removed
from U. Then a new iteration is started. These iterations proceed until U is
exhausted. The procedure then outputs the sets DR, DT.
Note that each of the original clusters in DR is covered by some cluster YDT
constructed during the execution of the procedure. However, some original R
clusters are thrown out of consideration without being subsumed by any
cluster in DT; these are precisely the clusters merged into some external layer
z-y.
Procedure Cover(R) is formally described in figure 2. The collections DT, DR
constructed satisfy the following properties:
a. DT coarsens DR
b. YY' =  for every Y, Y'DT
c. |DR|≥|R|1-1/k
d. Rad(DT)≤(2k-1)Rad(R)
We will now present the algorithm MAX_COVER. Its task is to construct a
cover which satisfies the properties described above.
The output collection of cover clusters, T, is initially empty. The algorithm
maintains the set of "remaining" clusters R. These are the clusters not yet
subsumed by the constructed cover. Initially, R=S, and the algorithm
-2-
terminates once R=. The algorithm operates in phases. Each phase consists
of the activation of the procedure Cover(R) which adds a sub collection of
output clusters DT to T and removes the set of subsumed original clusters DR
from R. Algorithm MAX_COVER is formally described in figure 1."
3. Our Implementation
We chose to implement the algorithm in Java code.
The code could be separated into two main modules: I/O and Algorithm Core.
I/O:
We decided to represent our Input and Output as XML documents.
We found XML to be humanly readable and powerful at the same time.
XML format can be easily generated / read by external utilities. This allows
easily adding a GUI for creating input or displaying output in a generic way.
It also allows simple input validation using a XML Schema (XSD).
In order to parse the input, we used and modified an open source package
called qdxml. We created a class called XmlParser which uses the qdxml
parser to parse our input XML.
-3-
The Input of our algorithem is a graph G(V,E) represented by the following
schma:
<?xml version = "1.0" encoding = "UTF-8"?>
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
elementFormDefault = "qualified">
<xsd:element name = "graph">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "vertex" maxOccurs = "unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "edge" type = "xsd:integer" minOccurs = "0"
maxOccurs = "unbounded"/>
</xsd:sequence>
<xsd:attribute name = "id" use = "required" type = "xsd:integer"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
A sample input:
<?xml version = "1.0" encoding = "UTF-8"?>
<graph>
<vertex id="1">
<edge>2</edge>
<edge>3</edge>
</vertex>
<vertex id="2">
<edge>1</edge>
<edge>3</edge>
</vertex>
<vertex id="3">
<edge>1</edge>
<edge>2</edge>
</vertex>
</graph>
A graphic representation of the sample input xml:
2
1
3
-4-
The output of our algorithm is a collection of clusters.
The output XML is rendered by a class called ClustersToXml.
The output XML is represented by the following schema:
<?xml version = "1.0" encoding = "UTF-8"?>
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
elementFormDefault = "qualified">
<xsd:element name = "Clusters">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "Cluster" maxOccurs = "unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "Vertex" maxOccurs = "unbounded">
<xsd:complexType>
<xsd:attribute name = "id" use = "required" type = "xsd:integer"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
A sample output:
<?xml version = "1.0" encoding = "UTF-8"?>
<Clusters>
<Cluster>
<vertex id="1"/>
<vertex id="2"/>
</Cluster>
<Cluster>
<vertex id="1"/>
<vertex id="3"/>
</Cluster>
</Clusters>
A graphic representation of the sample output xml:
2
1
3
-5-
Algorithm Core:
The Algorithm uses several objects to contain the processed data:
 Edge – a representation of an Edge in a graph.
 Vertex – a representation of a Vertex in a graph.
A Vertex contains a collection of Edge Objects.
 Graph – a representation of a Graph.
A graph Graph contains a collection of Vertices.
The algorithm itself is executed by the class CoverExecutor.
The class CoverExecutor contains the following methods:
 MaxCover – Receives a Graph and an integer k as input, generates a
collection of clusters S. The method then performs the MAX_COVER
algorithem described in Figure 1 above.
 ClustersFactory – Receives a Graph and Generates a collection of
clusters as described above.
 Cover – Receives a collection of clusters and an integer k. This method
performs the algorithm Cover(R) described above in Figure 2.
Our Max_Cover:
public Vector MaxCover(Graph graph, int k){
TreeMap S = clustersFactory(graph);
Vector victor = null;
Vector DT = null;
TreeMap DR;
Vector T = new Vector();
TreeMap R = (TreeMap) S.clone();
while (R.size() > 0){
victor = cover(S, k);
DT= (Vector)victor.elementAt(0);
DR= (TreeMap)victor.elementAt(1);
T.addAll(DT);
R.values().removeAll(DR.values());
S = (TreeMap) R.clone();
}//while
return T;
}
Our method MaxCover(graph,k) receives a Graph and an integer as input.
It uses the ClustersFactory to generate the collection of clusters S.
It initializes DR, DT and T, and creates R (a clone of S)
The main loop runs until R is exhausted. The loop performers the following
operations:
Call Cover(S, k),
Assign result to DT, DR,
T  T DT
R  R \ DR
SR
When the loop ends, the method returns T.
-6-
Our Cover(R):
private Vector cover(TreeMap R , int k){
// R - a collection of unprocessed clusters.
Vector ans = new Vector();
Vector DT = new Vector(); // The Collection of the kernels.
TreeMap DR = new TreeMap();
TreeMap Y = new TreeMap(); // One output cluster (Kernel) per iteration
TreeMap y = null, Z; // y - unmerged collection.
boolean shouldContinue;
Graph S;
int rSize = R.size();
double size;
while (R.size() > 0){
shouldContinue = true;
S = (Graph) R.get(R.firstKey());//Select an arbitrary cluster S < U
Z = new TreeMap(); // Y is the Kernel of Z
Z.put(new Integer(S.getId()), S); // Z <- {S}
while(shouldContinue){ //Repeat
shouldContinue = R.size() > Z.size();
y = (TreeMap)Z.clone(); // y <- Z
Y = prepareY(y);// Y <- Us
shouldContinue &= (addCluster(Z,R,Y)>1);
size = Math.pow(rSize, (double)1/k) * y.size();
shouldContinue &= !(Z.size() <= size); // until |Z| <= |R|^1/k*|y| or no more
// clusters
}
//while
removeClusters(R,Z);
DT.add(Y);
DR.putAll(y);
}
//while
ans.add(DT);
ans.add(DR);
return ans;
}
Our Method cover(R,k) receives a collection of unprocessed cluster R and an integer
k.
It initializes DT, DR, Y, y and Z. these variables (with the same names) are used for
the same purpose as the in the pseudo code in figure 2, we used R without coping it to
U as done in the pseudo code.
The method then loops until R is exhausted. The loop performs the following:
1. Selects an arbitrary cluster S from R
2. Initializes Z to be a collection of clusters containing S.
3. Loop until |Z| <= |R|^1/k*|y| or no more unprocessed clusters left in R.
3.1. y  Z
3.2. Y  sy S (done by the method prepareY)
3.3. Z  S | S  R, S Y   (done by the method addCluster)
4. R  R \ Z (done by the method removeClusters)
5. DT  DT Y 
6. DR  DR y
The method returns (DR, DT) (in a vector).
-7-
4. Possible applications of the algorithm
The Algorithm can be used for computer network routing.
The algorithm allows a network administrator to manage the tradeoff between
the cost and performance of a dense network and those of a sparse one.
In this scenario, the nodes of the graph represent workstations. A cluster
represents a routing table.
The larger the clusters are (on a high value of k), the routing table is bigger,
thus more memory is needed, but the path between two given workstations
will be more efficient (probably shorter, and faster to find).
In real life, This is not applicable since network routing rules and clusters are
defined under physical limitations of the network span. The physical location
of the workstations will determine their network cluster.
This may not be the case in "Logical Networks". A logical network exists
within a distributed application. Applications like file sharing applications
where all users are connecting to a logical network where each node contains
the location of some of the files shared over the network, distributed grid
computing like the SETI (search for extra terrestrial intelligence) project
where users connect and donate their free computing resources to find life in
outer space. The logical network is created by all the users that are connected
from different locations over the internet regardless of their physical location.
-8-
5. Experiments
We chose to study the effect of a graph's degree on the number of clusters in
the result T given different values of k.
If we look at a graph as a network, where vertices are nodes, the degree of a
vertex represents the size of a node's routing table.
We run the algorithm with several graphs of different degrees.
Each graph contained 20 nodes.
All of the nodes had the same degree.
All nodes of a graph were accessible from every other node.
We will present the results for three of them (Degrees 3, 6 & 10)
The X axis is the value of k,
The Y axis is the number of clusters in the result T.
25
20
15
10
5
Degree 3
Degree 6
k
1
2
3
4
5
6
7..58
58
59+
1
4
7
10
13
16
19
Degree 10
Degree 10 Degree 6 Degree 3
20
20
20
1
2
4
1
2
4
1
2
4
1
2
3
1
2
3
1
2
2
1
2
2
1
1
1
-9-
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
0
6. Conclusions
First of all, all our conclusions are based on the way we chose to build the
collections of clusters R from a given graph G as the algorithm's input.
We expected to find a correlation between the degree of the input graph and
the value of k for which the number of output clusters decreases. Fewer output
clusters results in a more coarsened result. The result contains larger clusters,
which means that every vertex is "aware" of the location of a large amount of
other vertices (in the case of network routing, a larger routing table).
As we expected, our experiments show that there is a direct connection
between the degree of the graph and the value of k for which the number of
clusters in T decreases.
Every collection of clusters can be coarsened into one cluster given a large
enough value of k.
The higher the degree of the graph, the smaller value of k needed to merge the
clusters.
We can see that the clusters of a graph of degree 10+ will be merged into one
cluster at k=2.
For smaller degrees we can see a more moderate descent of the number of
clusters when the value of k rises.
Due to technical reasons, we found it hard to generate graphs of more than 20
nodes. If we would have done so, we would have seen the tendencies more
clearly.
The higher the degree of the graph is, the cheaper (lower value of k) it will be
to obtain a shorter and more efficient routing path
-10-
References

[ABPL90] - B. Awerbuch and D. Peleg,Sparse partitions,
Proceedings of the 31st Annual IEEE Symposium on Foundations of Computer
Science, 1990, pp. 507--509.
-11-