【文章內(nèi)容簡介】
balance between descriptivity and prescriptivity 43 Algorithms and Concurrency ? Introduction to Parallel Algorithms ? Tasks and Deposition ? Processes and Mapping ? Processes Versus Processors ? Deposition Techniques ? Recursive Deposition ? Data Deposition ? Exploratory Deposition ? Hybrid Deposition ? Characteristics of Tasks and Interactions ? Task Generation, Granularity, and Context ? Characteristics of Task Interactions. 44 Concurrency and Mapping ? Mapping Techniques for Load Balancing ? Static and Dynamic Mapping ? Methods for Minimizing Interaction Overheads ? Maximizing Data Locality ? Minimizing Contention and HotSpots ? Overlapping Communication and Computations ? Replication vs. Communication ? Group Communications vs. PointtoPoint Communication ? Parallel Algorithm Design Models ? DataParallel, WorkPool, Task Graph, MasterSlave, Pipeline, and Hybrid Models 45 Preliminaries: Deposition, Tasks, and Dependency Graphs ? The first step in developing a parallel algorithm is to depose the problem into tasks that can be executed concurrently ? A given problem may be deposed into tasks in many different ways ? Tasks may be of same, different, or even interminate sizes ? A deposition can be illustrated in the form of a directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing the next. Such a graph is called a task dependency graph 46 Example: Multiplying a Dense Matrix with a Vector Computation of each element of output vector y is independent of other elements. Based on this, a dense matrixvector product can be deposed into n tasks. The figure highlights the portion of the matrix and vector accessed by Task 1. Observations: While tasks share data (namely, the vector b ), they do not have any control dependencies ., no task needs to wait for the (partial) pletion of any other. All tasks are of the same size in terms of number of operations. Is this the maximum number of tasks we could depose this problem into? 47 Example: Database Query Processing Consider the execution of the query: MODEL = ``CIVIC39。39。 AND YEAR = 2021 AND (COLOR = ``GREEN39。39。 OR COLOR = ``WHITE) on the following database: ID Model Year Color Dealer Price 4523 Civic 2021 Blue MN $18,000 3476 Corolla 1999 White IL $15,000 7623 Camry 2021 Green NY $21,000 9834 Prius 2021 Green CA $18,000 6734 Civic 2021 White OR $17,000 5342 Altima 2021 Green FL $19,000 3845 Maxima 2021 Blue NY $22,000 8354 Accord 2021 Green VT $18,000 4395 Civic 2021 Red CA $17,000 7352 Civic 2021 Red WA $18,000 48 Example: Database Query Processing The execution of the query can be divided into subtasks in various ways. Each task can be thought of as generating an intermediate table of entries that satisfy a particular clause. Deposing the given query into a number of tasks. Edges in this graph denote that the output of one task is needed to acplish the next. 49 Example: Database Query Processing Note that the same problem can be deposed into subtasks in other ways as well. An alternate deposition of the given problem into subtasks, along with their data dependencies. Different task depositions may lead to significant differences with respect to their eventual parallel performance. 50 Granularity of Task Depositions ? The number of tasks into which a problem is deposed determines its granularity. ? Deposition into a large number of tasks results in finegrained deposition and that into a small number of tasks results in a coarse grained deposition. A coarse grained counterpart to the dense matrixvector product example. Each task in this example corresponds to the putation of three elements of the result vector. 51 Degree of Concurrency ? The number of tasks that can be executed in parallel is the degree of concurrency of a deposition. ? Since the number of tasks that can be executed in parallel may change over program execution, the maximum degree of concurrency is the maximum number of such tasks at any point during execution. What is the maximum degree of concurrency of the database query examples? ? The average degree of concurrency is the average number of tasks that can be processed in parallel over the execution of the program. Assuming that each tasks in the database example takes identical processing time, what is the average degree of concurrency in each deposition? ? The degree of concurrency increases as the deposition bees finer in granularity and vice versa. 52 Critical Path Length ? A directed path in the task dependency graph represents a sequence of tasks that must be processed one after the other. ? The longest such path determines the shortest time in which the program can be executed in parallel. ? The length of the longest path in a task dependency graph is called the critical path length. 53 Critical Path Length Consider the task dependency graphs of the two database query depositions: What are the critical path lengths for the two task dependency graphs? If each task takes 10 time units, what is the shortest parallel execution time for each deposition? How many processors are needed in each case to achieve this minimum parallel execution time? What is the maximum degree of concurrency? 54 Limits on Parallel Performance ? It would appear that the parallel time can be made arbitrarily small by making the deposition finer in granularity. ? There is an inherent bound on how fine the granularity of a putation can be. For example, in the case of multiplying a dense matrix with a vector, there can be no more than (n2) concurrent tas