ai aj ai , a j a j , ai Pi Pj Pi Pj Step 1 Step 2 min{ai , a j } Pi max{ai , a j } Pj Step 3 Figure 9.1 A parallel compare-exchange operation. Processes Pi and P j send their elements to each other. Process Pi keeps min{ai , a j }, and P j keeps max{ai , a j }. 1 6 8 11 13 2 7 9 10 12 1 6 8 11 13 2 7 9 10 12 Pi Pj Step 2 1 2 6 7 8 9 10 11 12 13 Pj Pi Step 3 Pj Pi Step 1 1 2 6 7 8 9 10 11 12 13 2 7 9 10 12 1 6 8 11 13 1 2 6 7 8 9 10 11 12 13 Pj Pi Step 4 Figure 9.2 A compare-split operation. Each process sends its block of size n/ p to the other process. Each process merges the received block with its own block and retains only the appropriate half of the merged block. In this example, process Pi retains the smaller elements and process P j retains the larger elements. x x = min{x, y} y x x = min{x, y} y y = max{x, y} y = max{x, y} (a) x x = max{x, y} y x x = max{x, y} y y = min{x, y} y = min{x, y} (b) Figure 9.3 A schematic representation of comparators: (a) an increasing comparator, and (b) a decreasing comparator. Output wires Interconnection network Input wires Columns of comparators Figure 9.4 A typical sorting network. Every sorting network is made up of a series of columns, and each column contains a number of comparators connected in parallel. Original sequence 1st Split 2nd Split 3rd Split 4th Split 3 3 3 3 0 5 5 5 0 3 8 8 8 8 5 9 9 0 5 8 10 10 10 10 9 12 12 12 9 10 14 14 14 14 12 20 0 9 12 14 95 95 35 18 18 90 90 23 20 20 60 60 18 35 23 40 40 20 23 35 35 35 95 60 40 23 23 90 40 60 18 18 60 95 90 0 20 40 90 95 Figure 9.5 Merging a 16-element bitonic sequence through a series of log 16 bitonic splits. Wires 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 3 3 3 3 0 5 5 5 0 3 8 8 8 8 5 9 9 0 5 8 10 10 10 10 9 12 12 12 9 10 14 14 14 14 12 20 0 9 12 14 95 95 35 18 18 90 90 23 20 20 60 60 18 35 23 40 40 20 23 35 35 35 95 60 40 23 23 90 40 60 18 18 60 95 90 0 20 40 90 95 Figure 9.6 A bitonic merging network for n = 16. The input wires are numbered 0, 1 . . . , n − 1, and the binary representation of these numbers is shown. Each column of comparators is drawn separately; the entire figure represents a ⊕BM[16] bitonic merging network. The network takes a bitonic sequence and outputs it in sorted order. Wires 0000 0001 BM[2] 0010 0011 BM[2] 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 BM[4] BM[8] BM[2] BM[4] BM[2] BM[16] BM[2] BM[4] BM[2] BM[8] BM[2] BM[4] BM[2] Figure 9.7 A schematic representation of a network that converts an input sequence into a bitonic sequence. In this example, ⊕BM[k] and BM[k] denote bitonic merging networks of input size k that use ⊕ and comparators, respectively. The last merging network (⊕BM[16]) sorts the input. In this example, n = 16. Wires 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 10 10 5 3 20 20 9 5 5 9 10 8 9 5 20 9 3 3 14 10 8 8 12 12 12 14 8 14 14 12 3 20 90 0 0 95 0 90 40 90 60 60 60 60 40 40 90 40 23 23 95 35 35 35 35 23 95 95 23 18 18 18 18 0 Figure 9.8 The comparator network that transforms an input sequence of 16 unordered numbers into a bitonic sequence. In contrast to Figure 9.6, the columns of comparators in each bitonic merging network are drawn in a single box, separated by a dashed line. 0100 0000 1100 0110 0100 1110 0000 1010 0010 0110 0010 0001 0111 1101 1010 0001 1001 0111 0101 1111 0011 0000 0011 1100 1110 1010 0010 0110 0100 0010 0000 1110 1100 1010 1000 0111 1101 1111 1001 0111 0101 0001 0011 Step 3 1011 1001 1000 0001 1111 Step 2 0110 0101 1101 1011 Step 1 0100 1110 1000 1000 0101 1100 1101 1111 0011 1011 1001 1011 Step 4 Figure 9.9 Communication during the last stage of bitonic sort. Each wire is mapped to a hypercube process; each connection represents a compare-exchange between processes. Processors 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 1 2,1 1 3,2,1 1 2,1 1 4,3,2,1 1 2,1 1 3,2,1 1 2,1 1 Stage 1 Stage 2 Stage 3 Stage 4 Figure 9.10 Communication characteristics of bitonic sort on a hypercube. During each stage of the algorithm, processes communicate along the dimensions shown. 0000 0001 0010 0011 0000 0001 0010 0011 0000 0001 0100 0101 0100 0101 0110 0111 0111 0110 0101 0100 0010 0011 0110 0111 1000 1001 1010 1011 1000 1001 1010 1011 1000 1001 1100 1101 1100 1101 1110 1111 1111 1110 1101 1100 1010 1011 1110 1111 (a) (b) (c) Figure 9.11 Different ways of mapping the input wires of the bitonic sorting network to a mesh of processes: (a) row-major mapping, (b) row-major snakelike mapping, and (c) row-major shuffled mapping. Stage 4 Step 1 Step 2 Step 3 Step 4 Figure 9.12 The last stage of the bitonic sort algorithm for n = 16 on a mesh, using the rowmajor shuffled mapping. During each step, process pairs compare-exchange their elements. Arrows indicate the pairs of processes that perform compare-exchange operations. Unsorted 3 2 3 8 5 6 4 1 Phase 1 (odd) 2 3 3 8 5 6 1 4 Phase 2 (even) 2 3 3 5 8 1 6 4 Phase 3 (odd) 2 3 3 5 1 8 4 6 Phase 4 (even) 2 3 3 1 5 4 8 6 Phase 5 (odd) 2 3 1 3 4 5 6 8 Phase 6 (even) 2 1 3 3 4 5 6 8 Phase 7 (odd) 1 2 3 3 4 5 6 8 Phase 8 (even) 1 2 3 3 4 5 6 8 Sorted Figure 9.13 Sorting n = 8 elements, using the odd-even transposition sort algorithm. During each phase, n = 8 elements are compared. 0 3 4 5 6 7 2 1 0 3 4 5 6 7 2 1 0 3 4 5 6 7 2 1 Figure 9.14 An example of the first phase of parallel shellsort on an eight-process array. (a) 3 2 1 5 8 4 3 7 (b) 1 2 3 5 8 4 3 7 (c) 1 2 3 3 4 5 8 7 (d) 1 2 3 3 4 5 7 8 (e) 1 2 3 3 4 5 7 8 Pivot Final position Figure 9.15 Example of the quicksort algorithm sorting a sequence of size n = 8. 3 1 5 2 3 8 4 7 Figure 9.16 A binary tree generated by the execution of the quicksort algorithm. Each level of the tree represents a different array-partitioning iteration. If pivot selection is optimal, then the height of the tree is (log n), which is also the number of iterations. 1 2 3 (a) 5 6 7 8 1 2 3 4 leftchild 1 rightchild 5 5 6 7 8 (c) root = 4 (b) 1 (d) 4 33 21 13 54 82 33 40 72 4 5 leftchild 2 2 3 1 8 rightchild 6 5 6 7 8 1 2 leftchild 2 3 rightchild 6 3 4 5 1 8 5 6 7 8 (e) 7 [4] {54} 1 2 5 8 3 6 7 (f) [1] {33} 2 3 [3] {13} [5] {82} 6 7 8 [2] {21} [6] {33} 3 7 [8] {72} [7] {40} Figure 9.17 The execution of the PRAM algorithm on the array shown in (a). The arrays leftchild and rightchild are shown in (c), (d), and (e) as the algorithm progresses. Figure (f) shows the binary tree constructed by the algorithm. Each node is labeled by the process (in square brackets), and the element is stored at that process (in curly brackets). The element is the pivot. In each node, processes with smaller elements than the pivot are grouped on the left side of the node, and those with larger elements are grouped on the right side. These two groups form the two partitions of the original array. For each partition, a pivot element is selected at random from the two groups that form the children of the node. P0 P1 P3 P2 8 pivot selection 4 19 16 5 12 11 8 after local rearrangement 5 18 13 17 14 20 10 15 9 19 16 12 11 8 after global rearrangement 7 13 18 2 17 1 14 20 6 10 15 9 3 16 19 4 11 12 5 First Step pivot=7 P0 P1 P3 P2 7 2 18 13 1 17 14 20 6 10 15 9 7 2 1 6 3 P0 7 2 1 4 P1 6 3 4 3 P3 P2 P4 P4 5 18 13 17 14 20 10 15 9 19 16 12 11 8 pivot=5 Second Step P4 pivot selection pivot=17 P0 P1 P3 P2 P4 1 2 7 6 3 4 5 14 13 17 18 20 10 15 9 19 16 12 11 8 after local rearrangement 1 2 3 4 5 7 6 14 13 17 10 15 9 16 12 11 8 18 20 19 after global rearrangement P0 1 2 3 P1 4 5 7 P3 P2 P4 6 14 13 17 10 15 9 16 12 11 8 18 20 19 pivot selection Third Step pivot=11 P0 1 2 3 P1 4 5 6 P2 7 10 13 17 14 15 9 Fourth Step 10 9 P0 1 2 3 P1 4 5 6 7 8 12 11 16 18 19 20 after local rearrangement after global rearrangement P3 after local rearrangement 8 12 11 13 17 14 15 16 P2 8 P4 8 12 11 13 17 14 15 16 P2 10 9 P3 P3 P4 9 10 11 12 13 14 15 16 17 18 19 20 Solution Figure 9.18 An example of the execution of an efficient shared-address-space quicksort algorithm. P0 P1 P3 P2 7 13 18 2 17 1 14 20 6 10 15 9 P4 8 pivot selection 4 19 16 5 12 11 8 after local rearrangement 3 16 19 4 11 12 5 pivot=7 P0 7 P1 P3 P2 2 18 13 1 17 14 20 6 10 15 9 |Si | 2 1 1 2 3 1 2 Prefix Sum 0 2 3 4 P4 3 3 2 |L i | 3 Prefix Sum 6 7 0 2 5 8 10 13 7 2 1 6 3 4 5 18 13 17 14 20 10 15 9 19 16 12 11 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Figure 9.19 Efficient global rearrangement of the array. after global rearrangement P0 P1 P2 22 7 13 18 2 17 1 14 20 6 10 24 15 9 21 3 16 19 23 4 11 12 5 P0 1 2 P1 7 13 14 17 18 22 3 P0 1 2 3 4 6 9 10 15 20 21 24 4 6 7 8 P2 5 8 11 12 16 19 23 Initial element distribution Local sort & sample selection 7 17 9 20 8 16 Sample combining 7 Global splitter selection 8 9 16 17 20 P1 5 8 P2 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Final element assignment Figure 9.20 An example of the execution of sample sort on an array with 24 elements on three processes. Hypercube 100 000 Sequence of Elements 110 (a) Split along the third dimension. Partitions the sequence into two big blocks−one smaller 010 101 0** 111 1** and one larger than the pivot. 011 001 110 100 000 010 101 01* subblock into two smaller subblocks. 111 00* 011 001 (b) Split along the second dimension. Partitions each 11* 100 10* 110 (c) Split along the first 000 010 011 110 111 000 001 100 101 111 101 001 010 011 dimension. The elements are sorted according to the global ordering imposed by the processors’ labels onto the hypercube. Figure 9.21 The execution of the hypercube formulation of quicksort for d = 3. The three splits – one along each communication link – are shown in (a), (b), and (c). The second column represents the partitioning of the n-element sequence into subcubes. The arrows between subcubes indicate the movement of larger elements. Each box is marked by the binary representation of the process labels in that subcube. A ∗ denotes that all the binary combinations are included. Root (a) (b) Figure 9.22 (a) An arbitrary portion of a mesh that holds part of the sequence to be sorted at some point during the execution of quicksort, and (b) a binary tree embedded into the same portion of the mesh.
© Copyright 2025 Paperzz