An instrumental network routing algorithm for wireless networks base on the reinforcement-learning algorithms and network traffic Robabeh Chanpa Young researcher club of Azad Islamic university branch Salmas Salmas, Iran. Email: Chanpa.robab @yahoo.com Abstract- Reinforcement – Learning methods are widely used in routing problems. These methods interact with network changes, so are called Adaptive routing methods. Q – Learning algorithms have some quantities which are labeled Q. In the routing methods which apply this algorithm, these values are added to the headers of packets. As a result, if we add forward exploration to backward One, their header will be increased. We tried in this paper to decrease the headers of the packets by presenting some changes in above mentioned algorithms and giving a new algorithm which leads to increased throughput. The throughput evaluated based on different network criteria and is compared with current methods. Keywords-reinforcement learning; Q- learning; throughput; I. INTRODUCTION Routing strategy is the most important among other problems of ad hoc. The process of transmitting packets from its source node "s" to its destination node "d" in a network is called routing. Usually packets can run from many hops in their rout from source to destination. In each node a received packet is stored and sent to other hop until it reaches to destination. [2] Routing protocol is laid on network layer. Router's goal is Routing a message (packet) via sub network attached to it. [3] In recent years, agent based systems and reinforcement – learning have been widely applied in routing problems. Q – Routing is one of the reinforcement – Learning forms and it's also an adaptive routing algorithm [5] That sends packets on the base of rout's information Learned from other neighbors. In routing algorithms based on reinforcement – Learning, routing protocol involves tables, so each router has a table or input for each sub network. Table stores the name of destination sub network and the neighbor hood sub network which router sends the message into it, and it may also include additional information like the value of getting to destination. Distance to destination is measured by the number of existed sub networks in a rout, their speed and spent time along the rout. [6] In Q – Routing algorithms there are some quantities called as Q and known as the headers which are used for finding the best rout. Whenever the number of Q’s value is more than one, headers are Jamshid Bagherzadeh Assistance professor, Computer Science and Engineering Department, Urmia University. Urmia, Iran. Email: j.bagherzadeh@urmia.ac.ir increased and it leads to the decreased algorithm throughput. In these algorithms we have increased transmission of networks by controlling the headers. This paper is organized as follows. In section 2, we have a review of the related works in this area. First, we introduce Bellman – Ford method. Then other algorithms such as Q – Routing, DRQ – Routing, PQ – Routing, CQ – Routing and CDRQ – Routing are considered. In the third section suggested method is stated. In section 4, results and comparison among methods are discussed. At the fifth section involves conclusion and suggestions for future works. II. OTHER RELATED WORKS Traditionally presented algorithms in routing usually are based on Bellman – Ford algorithm and Dijestra's methods. This algorithm solves single – source shortest – path problem for weighty graphs in which the weight may be negative. Dijestra's algorithms solves the same problem in a short time, But in that algorithm there should be non negative numbers for the weight. The main structure of Bellman – Ford algorithm is the same as Dijestra's one. d is defined in a way that for each V, dv is equal to the weight of the shortest transmission to V. So at the end of (|V|-1)th step dv is equal to the weight of shortest rout from source to V. (In fact, since we assume there is no a cycle with a negative weight, the shortest transmission with the maximum |V|-1 from source to V, will be the shortest route from source to V in G). Routing algorithms try to be in accordance with dynamic traffic's condition. These algorithms may apply routes by a larger long and less traffic. One of the most important algorithms which are used in most of methods is Q – Learning. Some of them are discussed in the following sections. A. Bellman-Ford algorithm Bellman Ford algorithm solves the single-source shortestpath problem in the general case in which edges of a given digraph can have negative weight as long as G contains no negative cycles. This algorithm, like Dijkstra’s algorithm, uses the notion of edge relaxation but does not use with greedy approach. Again, it uses d[v] as an upper bound on the distance d[s, v] from source node s to the node v. The algorithm progressively decreases an estimate d[v] on the weight of the shortest path from the source vertex s to each vertex v in V until it achieve the actual shortest-path. The algorithm returns Boolean TRUE if the given digraph contains no negative cycles that are reachable from source vertex s otherwise it returns Boolean FALSE. B. Q – Routing algorithm This algorithm is a way for network routings. It's suggested by Littman and Boyan in 1993 [6, 9] and is adjusted automatically for routing. In each node there is a plan for reinforcement learning. Each node possesses a search table of Q for keeping correct estimations. The last routing policy is equated to the set of all local decisions made by single nodes and Q – table. In this algorithm, the node x receives packet P, sends it to the neighbor node y, in a way x estimates the shortest time for delivering packet to destination. y = argminy Qx(d,y) In addition, by sending p to y, the Node x obtains the estimation of y for the shortest time to d. y is near the d so its estimation is corrector than x and it can be used to change estimation of x to d. In fact Node y sends Value of t gained from the following formula [2, 4 and 5]: ty = min z is a neighbor of y Qy(d,z). With knowing that P is spent qx in transmission between x and y, Estimation of x changes to [13, 16]: ∆Qx(d,y)= η (qx + transxy + ty - Qx(d,y) ) Q'x(d,y)= Qx(d,y) + ∆Qx(d,y) In this formula, η is the learning amount parameter, ∆Qx(d,y) the rate of change in Q comparing with previous Q and Q'x(d,y) is the new estimation of difference between x to d. C. DRQ – Routing algorithm In DRQ – Routing Q – Routing is improved by updating two Q – Values per packet hop. This idea is also known as backward exploration (fig. 1) [6, 8]. When a node x sends a packet to node y it may send its own Q – Values to y (backward exploration) while node y responds by sending its own information (forward exploration). This technique results in performance [2, 9 and 12]. Qx(s,z’) = minz is a neighbor of x Qx(s,z) ΔQy(s,x)= ƞb( qy + Qx(s,z’) - Qy(s,x) ) Q'y(s,x)= Qy(s,x)+ ΔQy(s,x) In figure1, packet of node x arrived from source node s is sent to node y, It is also carries the estimated time that it takes from node x to s, Qx(d,z). With this information node y updates its own estimation Qy (s, x) for the entry node or input node x associated with destination s. Therefore, in DRQ both backward and forward exploration can be used to update the Q entries. However, this adds an overhead to the packet and to the algorithms [11]. Qy(d,z')= min z is a neighbor of y Qy(d,z) ΔQx(d,y)= (qx + Qy(d,z') - Qx(d,y)) Q'x(d,y)= Qx(d,y) + ΔQx(d,y) Figure1. Forward exploration. D. PQ – routing algorithm The predictive Q – routing algorithms keeps the best experiences (best Q – Values) Learned and reuses them according to its probabilistic prediction. Under low network load, the optimal policy is shorter than other conditions. However, if the congested paths are not used for a period of time they will recover and become "good candidates". Hence the algorithms should "probe" those paths at a recovery speed of the path [8, 14, and 16]. E. CQ –Routing algorithm CQ – Routing uses Confidence Values to improve Q – Routing. Q – Values become old when are not updated for a long time and thus do not represent the true state of the network [1]. Therefore a confidence value C, is associated with every Q – Value close to 1 indicates that the Q – Value with which it is associated is reliable and exactly represents the state of the network while a C – Value of O indicates that Q – Value is almost random [7, 9 and 16]. In Q – Routing there is no way of quantifying and measuring the reliability of a Q – Value. Moreover, the learning rate is constant for all updates, although it should depend on how reliable the updated Qx(d,y) and estimated Qy(d,z) Q – Values are. There issues are addressed in CQ – Routing. In CQ – Routing, the accuracy or reliability of each Q – Value Qx(d,y) is quantified by an associated confidence Value (C – Value), Cx(d,y)ε[1,0]. Cx(d,y) close to 1 indicates that Qx(d,y) represents the network state accurately, while Cx(d,y) close to 0 indicates that Qx(d,y) is almost random. The basis case C – Values corresponding to the base case Q – Values for any node y in the network are Q(y, y)=0 and C(y, y)=1. The C – values corresponding to all the Q – Values, which are initialized randomly, are initially set to 0. In CQ – Routing, the learning rate depends on the C Values of the Q – Values that are parts of the update. More specifically, when node x sends a packet P(s, d) to its neighbor y, it gets back not only the best estimate of node y for the remaining Part of P (s, d)'s journey, but also the confidence Value associated with this Q value called Cy(d,z). Now node x updates its Qx(d,y) value, it first computes the learning rate: ηf=η(Cold,Cest) which depends on both Cx(d,y)(=Cold) and Cy(d,z)(=Cest). Learning function rate η(Cold,Cest) is chosen such it is high if either (or both) Cold is low Cest is Cold Cupd Cold f ( Cest Cold ) high [9, 10, 15, ( Q _ value not updated ( Q _ value updated and16]. ) ) The C – Values are updated automatically if a Q – Value is not updated in the last step, then its C – Values decays with Constant λ ε (0, 1). If a Q – Value is updated the last time step, them its C – Value is updated based on Cold , Cest and ηf. F. CDRQ –Routing algorithm CDRQ–Routing combines the CQ–Routing concepts with DRQ–Routing ones. This algorithm provides fastest performance as it was shown by the experiments. Not only were the dual Q–values exchanged in one packet hop but their confidence values also [7, 8]. CDRQ–Routing combines both the CQ–Routing and DRQ–Routing components (figure2). In this algorithm, each hop of packet P(s,d) from node x to y, relevant Q – values Qx(d,y) and Qy(s,x), are updated. The corresponding C – values Cx(d,y) and Cy(s,x) are updated too. The learning rate for these updates, ƞf , ƞb , are computed. and network load increase. We matched K Values with Packet's drop rate parameter. Whenever this parameter increases, K Values increases too and it reduces exploration and then reduces header and network traffic. The high network traffic, the more dropped packets and the more filled nodes queues, then traffic will be increased. In fact K Values are the functions of parameter Value of packet's drop rate. Operation method of algorithms is in this form. At first, Q Values that are random and unstable (changeable) situation in each node, we count and regard k equated with one and for each packet that transfer from x to y node, Q – Values will be sent and will be received. After a while that Q – Values reached to more constant condition, We in K – Value based on functions that was chosen by network's traffic condition and connects K Value to network's traffic rate. If x queue traffic to y (or counter) be high, we enlarge k by said function and if traffic be less, we reduce k. This operation in high traffic make routing header's rate be less and low traffic rate reveals will be soon. It's supposed that when a packet of Current node (x) is send to its neighbor (y), doing it between x and y, Qy(s,x), Qx(d,y), Cy(s,x)and Cx(d,y) transfer too. In next K – 1 packets C and Q Values are not sending and there is no exploration for routing but in Kth packet, C and Q Values are transferred again. We called this algorithms K – CDRQ. Presenting algorithms is shown like a code in bellow: ALGORITHM K-CDRQ Routing (p,s,d). Figure2. CDRQ algorithm III. PROPOSED METHOD DRQ and CDRQ algorithms use two side explorations that is two Q – Values are updated in them and in CDRQ two C – Values are updated too. In fact these Values are considered headers. In addition to forward exploration, the existence of backward exploration causes increased headers in these algorithms. The high header Causes increased delivery time to destination and decreased throughput. In most routing methods existence of extra information for different routing algorithms causes delivery time to be long. The Increasing this parameter decrease efficiency and ability of the algorithm. In the result, decreasing the headers and it's influence on throughput parameter can be effective step in algorithm's efficiency improvement. In proposed method, we try to decrease the existent header in DRQ and CDRQ algorithms. For this purpose, instead of doing exploration for each packet, we do this once for every k packets, in a way that K – Values are changed dynamically and attentively to network traffic condition. In fewer loads, proposed algorithms not only operate like CDRQ algorithms but also K – Values are increased with queues compaction p ←the packet to be delivered; s←source of p; d←dest of p; k←1; stepy← 0 for any node y; While (not terminated) do If p includes the Q value then update Q and C: Backward exploration: ∆Qy(s,x) ← nb( qy+ transyx+tb-Qy(s,x)); Q'y(s,x) ←Qy(s,x)+∆Qy(s,x); Forward exploration: ∆Qx(d,y)=η (qx+trans xy+ty-Qx(d,y)); Q'x(d,y)= Qx(d,y) + ∆Qx(d,y); Do for forward and backward exploration: Cold← Cx(d,z), Cest ← Cy(d,z); f ← (Cold,Cest) (Q _ value not updated ) * Cold Cupd Cold f (Cest Cold ) (Q _ value updated ) End if; y←the next hop found from Q value; stepy←the number of packets sent to y without Q values; If stepy<k then stepy++; Send p to y without Q value; Else Send p to y with Q value; stepy←0; End if; K=update (K, D); End of while; Update (K, D). D is packet drop rate (traffic parameter); If (D>γ) K=µD+λ; // K is a function of D. IV. EXPERIMENTS The described experiments in this paper are based on a simulated communication network and 6*6 grids (figure3). Those packets sent for random nodes are introduced into this network at random nodes. The number of packets in all transmission queues, which is divided by the sum of all the queues max size, is called the network load [9]. Multiple packets at a node are stored and saved in its FIFO queue. At the first step, each node removes the packet from the beginning of its queue, examines the destination of the packet and uses its routing decision maker to send the packet to one of its neighborhood nodes. Based on the last several packets (e. g. last 100 packets) the average time of passing a packet, is defined as delivery time of a packet. Delivery time is measured in terms of simulation time steps. Another Parameter is throughput. The delivery time divided to the network load throughput. [9] In this paper we compared the algorithms by their throughputs. complete or not. So fewer loads like DRQ and CDRQ updates Q – Values (figure 6, 7 and 9) Figure4. All introduced algorithms simulations. Figure3. The network - 6*6 grids V. Results Figure 4 shows that DRQ is better than Q. It can be because of DRQ's exploration in two sides (backward and forward exploration). In middle loads the Bellman – Ford algorithm has better throughput than Q and DRQ, but in continue, its throughput gets decreased, because every 3 algorithms have processes about network and then the Bellman – Ford only sends the packets without processing network. The PQ algorithms uses the recovery rate of a path to estimate it's Q Value. Thus we can predict the traffic on that path. Moreover if the algorithm predicts another path is recovered, it probes that path. Reducing run of algorithms exploration phase to K packet, leads to on increase in throughput. Since CDRQ algorithms updates C – Value, this value are added to Q – Values but this algorithm must have high efficiency because of its characteristics to other algorithms. K – CDRQ algorithms remove this problem and increase the confidence by C – Values and use backward and forward explorations. In this paper, Exploration phase reduction doesn't reduce algorithms intelligence because K – Values determine explorations were dynamic and Figure5. Compare of K_DRQ and DRQ with together. Figure6. Compare of CDRQ and K_CDRQ with together. Figure7. Compare of Bellman Ford, Q, K_DRQ, CQ, PQ and K_CDRQ with together. VI. Conclusion and suggestions Although the compound use of reinforcement learning algorithms causes routing improvements but creates high header leads to optimum decrease, that can be presented with effective methods like the other ones, which are removed in this article. It should be finally suggested that next compound reinforcement learning methods be used to create other new methods. There is an example below: Figure9. New suggested methods. II. 12. REFERENCES [1] Analoui1, M., and A. Esfahani. Decrement delay with widest Kshortest paths Q-routing algorithm. 2009. Journal of Communication and Computer, ISSN 1548-7709: USA. [2] Andrei, S. 2000. A Survey on Network Routing and Reinforcement Learning. http://www.math.tau.ac.il/~mansour. [3] Boyan, J. A., and M. L. Littman. 1994. Packet Routing in Dynamically Changing Network: A Reinforcement Learning Approach, Advances in Neural Information Processing Systems 6. MIT Press, Cambridge. [4] Ducatelle, F. 2007. Adaptive Routing in Ad Hoc Wireless Multi-hop Networks. Doctoral Dissertation Submitted to the Faculty of Informatics of the University of Lugano in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy.218 Pages. [5] Edwill, N., and C. W. Omlin. 2004. Machine Learning Algorithms for Packet Routing in Telecommunication Networks. Department of Computer Science, University of the Western Cape. 7535 Bellville, South Africa. [6] Jin, Y., and M. Tsai. 2001. Temporal Network Analysis for Predictive Routing Table Optimization {yujia,mtsai}@eecs.Berkeley.edu .CS268 Final Project. [7] Kelly, D. 2005. Reinforcement Learning with Application to Adaptive Network Routing. Journal of Theoretical and Applied Information Technology. [8] Kulkarni1, S. A., and G. R. Rao. Formal Modeling of Reinforcement Learning Algorithms Applied for Mobile Ad Hoc Network. 2009. International Journal of Recent Trends in Engineering, VOL 2, No. 3 [9] Kumar, S., and R. Miikkulainen. 1998. Confidence Based Reinforcement Q-Routing: An Adaptive Online Network Routing Algorithm. The University of Texas at Austin. National Science Foundation. Artificial Intelligence Laboratory. [10] Kumar, S., and R. Miikkulainen. 1999. Confidence Based Dual Reinforcement Q-Routing: An Adaptive Online Network Routing Algorithm. Artificial Neural Networks in Engineering. The University of Texas at Austin. Artificial Intelligence Laboratory. [11] Mellouk, A., S. Larynouna and S. Hoceini. 2006. Adaptive Probabilistic Routing Schemes for Real Time Traffic in High Speed Dynamic Networks. IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.5B. [12] Shanbhagh, S. R., Y. V. Potdar and M. S. Phatak. 2006. Reinforcement Learning Algorithms in Routing: ”Q_Routing” Implementation and Analysis in NS-2. Fr. Conceicao Rodrigues College of Engineering. 70 pages. [13] Tekiner, F., Z. Ghassemlooy and T. R. Srikanth. 2004. Comparison of the Q-Routing and Shortest Path Routing Algorithms. School of Engineering & Technology, Northumbria University, Newcastle upon Tyne,UK. ftekiner@ieee.org. [14] Valdivia, Y. T., M. M. Vellasco and M. A. Pacheco. 2001. An Adaptive Network Routing Strategy With Temporal Differences. Asociacion Espanola Para La Inteligencia Artificial Valencia, Espana. pp: 85-91. [15] Yalamanchi, H. J. 2007. Reinforcement Learning for Network Routing. Oregon State University. [16] Yap, S. T., and M. Othman. 2004. An Adaptive Routing Algorithm: Enhanced Confidence Based Q Routing Algorithm in Network Traffic. Department of Communication Technology & Network University Putra Malaysia. Malaysian Journal Of Computer, vol. 17No.2. Pp 21-29.
© Copyright 2025 Paperzz