csse2011_submission_16.pdf

An instrumental network routing algorithm for wireless networks base on the
reinforcement-learning algorithms and network traffic
Robabeh Chanpa
Young researcher club of Azad Islamic university branch
Salmas
Salmas, Iran.
Email: Chanpa.robab @yahoo.com
Abstract- Reinforcement – Learning methods are widely used
in routing problems. These methods interact with network
changes, so are called Adaptive routing methods. Q – Learning
algorithms have some quantities which are labeled Q. In the
routing methods which apply this algorithm, these values
are added to the headers of packets. As a result, if we add
forward exploration to backward One, their header will be
increased. We tried in this paper to decrease the headers of the
packets by presenting some changes in above mentioned
algorithms and giving a new algorithm which leads to
increased throughput. The throughput evaluated based on
different network criteria and is compared with current
methods.
Keywords-reinforcement learning; Q- learning; throughput;
I.
INTRODUCTION
Routing strategy is the most important among other
problems of ad hoc. The process of transmitting packets
from its source node "s" to its destination node "d" in a
network is called routing. Usually packets can run from
many hops in their rout from source to destination. In each
node a received packet is stored and sent to other hop until it
reaches to destination. [2]
Routing protocol is laid on network layer. Router's goal is
Routing a message (packet) via sub network attached to it.
[3]
In recent years, agent based systems and reinforcement –
learning have been widely applied in routing problems. Q –
Routing is one of the reinforcement – Learning forms and it's
also an adaptive routing algorithm [5] That sends packets on
the base of rout's information Learned from other neighbors.
In routing algorithms based on reinforcement – Learning,
routing protocol involves tables, so each router has a table or
input for each sub network. Table stores the name of
destination sub network and the neighbor hood sub network
which router sends the message into it, and it may also
include additional information like the value of getting to
destination. Distance to destination is measured by the
number of existed sub networks in a rout, their speed and
spent time along the rout. [6] In Q – Routing algorithms
there are some quantities called as Q and known as the
headers which are used for finding the best rout. Whenever
the number of Q’s value is more than one, headers are
Jamshid Bagherzadeh
Assistance professor, Computer Science and Engineering
Department, Urmia University.
Urmia, Iran.
Email: j.bagherzadeh@urmia.ac.ir
increased and it leads to the decreased algorithm throughput.
In these algorithms we have increased transmission of
networks by controlling the headers. This paper is organized
as follows. In section 2, we have a review of the related
works in this area. First, we introduce Bellman – Ford
method. Then other algorithms such as Q – Routing, DRQ –
Routing, PQ – Routing, CQ – Routing and CDRQ – Routing
are considered. In the third section suggested method is
stated. In section 4, results and comparison among methods
are discussed. At the fifth section involves conclusion and
suggestions for future works.
II. OTHER RELATED WORKS
Traditionally presented algorithms in routing usually are
based on Bellman – Ford algorithm and Dijestra's methods.
This algorithm solves single – source shortest – path problem
for weighty graphs in which the weight may be negative.
Dijestra's algorithms solves the same problem in a short
time, But in that algorithm there should be non negative
numbers for the weight. The main structure of Bellman –
Ford algorithm is the same as Dijestra's one. d is defined in a
way that for each V, dv is equal to the weight of the shortest
transmission to V. So at the end of (|V|-1)th step dv is equal
to the weight of shortest rout from source to V. (In fact, since
we assume there is no a cycle with a negative weight, the
shortest transmission with the maximum |V|-1 from source to
V, will be the shortest route from source to V in G). Routing
algorithms try to be in accordance with dynamic traffic's
condition. These algorithms may apply routes by a larger
long and less traffic. One of the most important algorithms
which are used in most of methods is Q – Learning. Some of
them are discussed in the following sections.
A. Bellman-Ford algorithm
Bellman Ford algorithm solves the single-source shortestpath problem in the general case in which edges of a given
digraph can have negative weight as long as G contains no
negative cycles. This algorithm, like Dijkstra’s algorithm,
uses the notion of edge relaxation but does not use with
greedy approach. Again, it uses d[v] as an upper bound on
the distance d[s, v] from source node s to the node v.
The algorithm progressively decreases an estimate d[v]
on the weight of the shortest path from the source vertex s to
each vertex v in V until it achieve the actual shortest-path.
The algorithm returns Boolean TRUE if the given digraph
contains no negative cycles that are reachable from source
vertex s otherwise it returns Boolean FALSE.
B. Q – Routing algorithm
This algorithm is a way for network routings. It's
suggested by Littman and Boyan in 1993 [6, 9] and is
adjusted automatically for routing. In each node there is a
plan for reinforcement learning. Each node possesses a
search table of Q for keeping correct estimations. The last
routing policy is equated to the set of all local decisions
made by single nodes and Q – table.
In this algorithm, the node x receives packet P, sends it to
the neighbor node y, in a way x estimates the shortest time
for delivering packet to destination.
y = argminy Qx(d,y)
In addition, by sending p to y, the Node x obtains the
estimation of y for the shortest time to d. y is near the d so its
estimation is corrector than x and it can be used to change
estimation of x to d. In fact Node y sends Value of t gained
from the following formula [2, 4 and 5]:
ty = min z is a neighbor of y Qy(d,z).
With knowing that P is spent qx in transmission between
x and y, Estimation of x changes to [13, 16]:
∆Qx(d,y)= η (qx + transxy + ty - Qx(d,y) )
Q'x(d,y)= Qx(d,y) + ∆Qx(d,y)
In this formula, η is the learning amount parameter,
∆Qx(d,y) the rate of change in Q comparing with previous Q
and Q'x(d,y) is the new estimation of difference between x to
d.
C. DRQ – Routing algorithm
In DRQ – Routing Q – Routing is improved by updating
two Q – Values per packet hop. This idea is also known as
backward exploration (fig. 1) [6, 8].
When a node x sends a packet to node y it may send its
own Q – Values to y (backward exploration) while node y
responds by sending its own information (forward
exploration). This technique results in performance [2, 9 and
12].
Qx(s,z’) = minz is a neighbor of x Qx(s,z)
ΔQy(s,x)= ƞb( qy + Qx(s,z’) - Qy(s,x) )
Q'y(s,x)= Qy(s,x)+ ΔQy(s,x)
In figure1, packet of node x arrived from source node s is
sent to node y, It is also carries the estimated time that it
takes from node x to s, Qx(d,z). With this information node y
updates its own estimation Qy (s, x) for the entry node or
input node x associated with destination s. Therefore, in
DRQ both backward and forward exploration can be used to
update the Q entries. However, this adds an overhead to the
packet and to the algorithms [11].
Qy(d,z')= min z is a neighbor of y Qy(d,z)
ΔQx(d,y)=  (qx + Qy(d,z') - Qx(d,y))
Q'x(d,y)= Qx(d,y) + ΔQx(d,y)
Figure1. Forward exploration.
D. PQ – routing algorithm
The predictive Q – routing algorithms keeps the best
experiences (best Q – Values) Learned and reuses them
according to its probabilistic prediction. Under low network
load, the optimal policy is shorter than other conditions.
However, if the congested paths are not used for a period of
time they will recover and become "good candidates".
Hence the algorithms should "probe" those paths at a
recovery speed of the path [8, 14, and 16].
E. CQ –Routing algorithm
CQ – Routing uses Confidence Values to improve Q –
Routing. Q – Values become old when are not updated for a
long time and thus do not represent the true state of the
network [1]. Therefore a confidence value C, is associated
with every Q – Value close to 1 indicates that the Q – Value
with which it is associated is reliable and exactly represents
the state of the network while a C – Value of O indicates that
Q – Value is almost random [7, 9 and 16].
In Q – Routing there is no way of quantifying and
measuring the reliability of a Q – Value. Moreover, the
learning rate is constant for all updates, although it should
depend on how reliable the updated Qx(d,y) and estimated
Qy(d,z) Q – Values are. There issues are addressed in CQ –
Routing. In CQ – Routing, the accuracy or reliability of each
Q – Value Qx(d,y) is quantified by an associated confidence
Value (C – Value), Cx(d,y)ε[1,0]. Cx(d,y) close to 1
indicates that
Qx(d,y) represents the network state
accurately, while Cx(d,y) close to 0 indicates that Qx(d,y) is
almost random. The basis case C – Values corresponding to
the base case Q – Values for any node y in the network are
Q(y, y)=0 and C(y, y)=1. The C – values corresponding to all
the Q – Values, which are initialized randomly, are initially
set to 0. In CQ – Routing, the learning rate depends on the C
Values of the Q – Values that are parts of the update. More
specifically, when node x sends a packet P(s, d) to its
neighbor y, it gets back not only the best estimate of node y
for the remaining Part of P (s, d)'s journey, but also the
confidence Value associated with this Q value called
Cy(d,z). Now node x updates its Qx(d,y) value, it first
computes the learning rate: ηf=η(Cold,Cest) which depends on
both Cx(d,y)(=Cold) and Cy(d,z)(=Cest). Learning function
rate η(Cold,Cest) is chosen such it is high if either (or both)
Cold
is
low
Cest
is
  Cold

Cupd  
Cold  f  ( Cest  Cold )
high [9, 10, 15,
( Q _ value not updated
( Q _ value
updated
and16].
)
)
The C – Values are updated automatically if a Q – Value
is not updated in the last step, then its C – Values decays
with Constant λ ε (0, 1). If a Q – Value is updated the last
time step, them its C – Value is updated based on Cold , Cest
and ηf.
F.
CDRQ –Routing algorithm
CDRQ–Routing combines the CQ–Routing concepts
with DRQ–Routing ones. This algorithm provides fastest
performance as it was shown by the experiments. Not only
were the dual Q–values exchanged in one packet hop but
their confidence values also [7, 8].
CDRQ–Routing combines both the CQ–Routing and
DRQ–Routing components (figure2). In this algorithm,
each hop of packet P(s,d) from node x to y, relevant Q –
values Qx(d,y) and Qy(s,x), are updated. The corresponding
C – values Cx(d,y) and Cy(s,x) are updated too. The learning
rate for these updates, ƞf , ƞb , are computed.
and network load increase. We matched K Values with
Packet's drop rate parameter. Whenever this parameter
increases, K Values increases too and it reduces exploration
and then reduces header and network traffic. The high
network traffic, the more dropped packets and the more filled
nodes queues, then traffic will be increased. In fact K Values
are the functions of parameter Value of packet's drop rate.
Operation method of algorithms is in this form. At first,
Q Values that are random and unstable (changeable)
situation in each node, we count and regard k equated with
one and for each packet that transfer from x to y node, Q –
Values will be sent and will be received. After a while that Q
– Values reached to more constant condition, We in K –
Value based on functions that was chosen by network's
traffic condition and connects K Value to network's traffic
rate. If x queue traffic to y (or counter) be high, we enlarge k
by said function and if traffic be less, we reduce k. This
operation in high traffic make routing header's rate be less
and low traffic rate reveals will be soon.
It's supposed that when a packet of Current node (x) is
send to its neighbor (y), doing it between x and y, Qy(s,x),
Qx(d,y), Cy(s,x)and Cx(d,y) transfer too. In next K – 1
packets C and Q Values are not sending and there is no
exploration for routing but in Kth packet, C and Q Values
are transferred again. We called this algorithms K – CDRQ.
Presenting algorithms is shown like a code in bellow:
ALGORITHM K-CDRQ Routing (p,s,d).
Figure2. CDRQ algorithm
III. PROPOSED METHOD
DRQ and CDRQ algorithms use two side explorations
that is two Q – Values are updated in them and in CDRQ two
C – Values are updated too. In fact these Values are
considered headers. In addition to forward exploration, the
existence of backward exploration causes increased headers
in these algorithms. The high header Causes increased
delivery time to destination and decreased throughput. In
most routing methods existence of extra information for
different routing algorithms causes delivery time to be long.
The Increasing this parameter decrease efficiency and ability
of the algorithm. In the result, decreasing the headers and it's
influence on throughput parameter can be effective step in
algorithm's efficiency improvement. In proposed method, we
try to decrease the existent header in DRQ and CDRQ
algorithms. For this purpose, instead of doing exploration for
each packet, we do this once for every k packets, in a way
that K – Values are changed dynamically and attentively to
network traffic condition. In fewer loads, proposed
algorithms not only operate like CDRQ algorithms but also
K – Values are increased with queues compaction
p ←the packet to be delivered;
s←source of p;
d←dest of p;
k←1;
stepy← 0 for any node y;
While (not terminated) do
If p includes the Q value then update Q and C:
Backward exploration:
∆Qy(s,x) ← nb( qy+ transyx+tb-Qy(s,x));
Q'y(s,x) ←Qy(s,x)+∆Qy(s,x);
Forward exploration:
∆Qx(d,y)=η (qx+trans xy+ty-Qx(d,y));
Q'x(d,y)= Qx(d,y) + ∆Qx(d,y);
Do for forward and backward exploration:
Cold← Cx(d,z), Cest ← Cy(d,z);
f ← (Cold,Cest)
(Q _ value not updated )
 * Cold

Cupd  
Cold   f (Cest  Cold ) (Q _ value
updated )
End if;
y←the next hop found from Q value;
stepy←the number of packets sent to y without Q
values;
If stepy<k then
stepy++;
Send p to y without Q value;
Else
Send p to y with Q value;
stepy←0;
End if;
K=update (K, D);
End of while;
Update (K, D).
D is packet drop rate (traffic parameter);
If (D>γ)
K=µD+λ; // K is a function of D.
IV. EXPERIMENTS
The described experiments in this paper are based on a
simulated communication network and 6*6 grids (figure3).
Those packets sent for random nodes are introduced into this
network at random nodes. The number of packets in all
transmission queues, which is divided by the sum of all the
queues max size, is called the network load [9].
Multiple packets at a node are stored and saved in its
FIFO queue. At the first step, each node removes the packet
from the beginning of its queue, examines the destination of
the packet and uses its routing decision maker to send the
packet to one of its neighborhood nodes. Based on the last
several packets (e. g. last 100 packets) the average time of
passing a packet, is defined as delivery time of a packet.
Delivery time is measured in terms of simulation time steps.
Another Parameter is throughput. The delivery time divided
to the network load throughput. [9]
In this paper we compared the algorithms by their
throughputs.
complete or not. So fewer loads like DRQ and CDRQ
updates Q – Values (figure 6, 7 and 9)
Figure4. All introduced algorithms simulations.
Figure3. The network - 6*6 grids
V. Results
Figure 4 shows that DRQ is better than Q. It can be
because of DRQ's exploration in two sides (backward and
forward exploration).
In middle loads the Bellman – Ford algorithm has better
throughput than Q and DRQ, but in continue, its throughput
gets decreased, because every 3 algorithms have processes
about network and then the Bellman – Ford only sends the
packets without processing network. The PQ algorithms
uses the recovery rate of a path to estimate it's Q Value.
Thus we can predict the traffic on that path. Moreover if the
algorithm predicts another path is recovered, it probes that
path. Reducing run of algorithms exploration phase to K
packet, leads to on increase in throughput. Since CDRQ
algorithms updates C – Value, this value are added to Q –
Values but this algorithm must have high efficiency because
of its characteristics to other algorithms.
K – CDRQ algorithms remove this problem and increase
the confidence by C – Values and use backward and
forward explorations. In this paper, Exploration phase
reduction doesn't reduce algorithms intelligence because K
– Values determine explorations were dynamic and
Figure5. Compare of K_DRQ and DRQ with together.
Figure6. Compare of CDRQ and K_CDRQ with together.
Figure7. Compare of Bellman Ford, Q, K_DRQ, CQ, PQ and
K_CDRQ with together.
VI.
Conclusion and suggestions
Although the compound use of reinforcement learning
algorithms causes routing improvements but creates high
header leads to optimum decrease, that can be presented
with effective methods like the other ones, which are
removed in this article. It should be finally suggested that
next compound reinforcement learning methods be used to
create other new methods. There is an example below:
Figure9. New suggested methods.
II.
12. REFERENCES
[1] Analoui1, M., and A. Esfahani. Decrement delay with widest Kshortest paths Q-routing algorithm. 2009. Journal of Communication and
Computer, ISSN 1548-7709: USA.
[2] Andrei, S. 2000. A Survey on Network Routing and Reinforcement
Learning. http://www.math.tau.ac.il/~mansour.
[3] Boyan, J. A., and M. L. Littman. 1994. Packet Routing in Dynamically
Changing Network: A Reinforcement Learning Approach, Advances in
Neural Information Processing Systems 6. MIT Press, Cambridge.
[4] Ducatelle, F. 2007. Adaptive Routing in Ad Hoc Wireless Multi-hop
Networks. Doctoral Dissertation Submitted to the Faculty of Informatics of
the University of Lugano in Partial Fulfillment of the Requirements for the
Degree of Doctor of Philosophy.218 Pages.
[5] Edwill, N., and C. W. Omlin. 2004. Machine Learning Algorithms for
Packet Routing in Telecommunication Networks. Department of Computer
Science, University of the Western Cape. 7535 Bellville, South Africa.
[6] Jin, Y., and M. Tsai. 2001. Temporal Network Analysis for Predictive
Routing Table Optimization {yujia,mtsai}@eecs.Berkeley.edu .CS268
Final Project.
[7] Kelly, D. 2005. Reinforcement Learning with Application to Adaptive
Network Routing. Journal of Theoretical and Applied Information
Technology.
[8] Kulkarni1, S. A., and G. R. Rao. Formal Modeling of Reinforcement
Learning Algorithms Applied for Mobile Ad Hoc Network. 2009.
International Journal of Recent Trends in Engineering, VOL 2, No. 3
[9]
Kumar, S., and R. Miikkulainen. 1998. Confidence Based
Reinforcement Q-Routing: An Adaptive Online Network Routing
Algorithm. The University of Texas at Austin. National Science
Foundation. Artificial Intelligence Laboratory.
[10] Kumar, S., and R. Miikkulainen. 1999. Confidence Based Dual
Reinforcement Q-Routing: An Adaptive Online Network Routing
Algorithm. Artificial Neural Networks in Engineering. The University of
Texas at Austin. Artificial Intelligence Laboratory.
[11] Mellouk, A., S. Larynouna and S. Hoceini. 2006. Adaptive
Probabilistic Routing Schemes for Real Time Traffic in High Speed
Dynamic Networks. IJCSNS International Journal of Computer Science
and Network Security, VOL.6 No.5B.
[12] Shanbhagh, S. R., Y. V. Potdar and M. S. Phatak. 2006.
Reinforcement Learning Algorithms in Routing: ”Q_Routing”
Implementation and Analysis in NS-2. Fr. Conceicao Rodrigues College of
Engineering. 70 pages.
[13] Tekiner, F., Z. Ghassemlooy and T. R. Srikanth. 2004. Comparison of
the Q-Routing and Shortest Path Routing Algorithms. School of
Engineering & Technology, Northumbria University, Newcastle upon
Tyne,UK. ftekiner@ieee.org.
[14] Valdivia, Y. T., M. M. Vellasco and M. A. Pacheco. 2001. An
Adaptive Network Routing Strategy With Temporal Differences.
Asociacion Espanola Para La Inteligencia Artificial Valencia, Espana. pp:
85-91.
[15]
Yalamanchi, H. J. 2007. Reinforcement Learning for Network
Routing. Oregon State University.
[16] Yap, S. T., and M. Othman. 2004. An Adaptive Routing Algorithm:
Enhanced Confidence Based Q Routing Algorithm in Network Traffic.
Department of Communication Technology & Network University Putra
Malaysia. Malaysian Journal Of Computer, vol. 17No.2. Pp 21-29.