【正文】
during execution. What is the maximum degree of concurrency of the database query examples? ? The average degree of concurrency is the average number of tasks that can be processed in parallel over the execution of the program. Assuming that each tasks in the database example takes identical processing time, what is the average degree of concurrency in each deposition? ? The degree of concurrency increases as the deposition bees finer in granularity and vice versa. 52 Critical Path Length ? A directed path in the task dependency graph represents a sequence of tasks that must be processed one after the other. ? The longest such path determines the shortest time in which the program can be executed in parallel. ? The length of the longest path in a task dependency graph is called the critical path length. 53 Critical Path Length Consider the task dependency graphs of the two database query depositions: What are the critical path lengths for the two task dependency graphs? If each task takes 10 time units, what is the shortest parallel execution time for each deposition? How many processors are needed in each case to achieve this minimum parallel execution time? What is the maximum degree of concurrency? 54 Limits on Parallel Performance ? It would appear that the parallel time can be made arbitrarily small by making the deposition finer in granularity. ? There is an inherent bound on how fine the granularity of a putation can be. For example, in the case of multiplying a dense matrix with a vector, there can be no more than (n2) concurrent tasks. ? Concurrent tasks may also have to exchange data with other tasks. This results in munication overhead. The tradeoff between the granularity of a deposition and associated overheads often determines performance bounds. 55 Task Interaction Graphs ? Subtasks generally exchange data with others in a deposition. For example, even in the trivial deposition of the dense matrixvector product, if the vector is not replicated across all tasks, they will have to municate elements of the vector. ? The graph of tasks (nodes) and their interactions/data exchange (edges) is referred to as a task interaction graph. ? Note that task interaction graphs represent data dependencies, whereas task dependency graphs represent control dependencies. 56 Task Interaction Graphs: An Example Consider the problem of multiplying a sparse matrix A with a vector b. The following observations can be made: ? As before, the putation of each element of the result vector can be viewed as an independent task. ? Unlike a dense matrixvector product though, only nonzero elements of matrix A participate in the putation. ? If, for memory optimality, we also partition b across tasks, then one can see that the task interaction graph of the putation is identical to the graph of the matrix A (the graph for which A represents the adjacency structure). 57 Task Interaction Graphs, Granularity, and Communication In general, if the granularity of a deposition is finer, the associated overhead (as a ratio of useful work associated with a task) increases. Example: Consider the sparse matrixvector product example from previous foil. Assume that each node takes unit time to process and each interaction (edge) causes an overhead of a unit time. Viewing node 0 as an independent task involves a useful putation of one time unit and overhead (munication) of three time units. Now, if we consider nodes 0, 4, and 5 as one task, then the task has useful putation totaling to three time units and munication corresponding to four time units (four edges). Clearly, this is a more favorable ratio than the former case. 58 Processes and Mapping ? In general, the number of tasks in a deposition exceeds the number of processing elements available. ? For this reason, a parallel algorithm must also provide a mapping of tasks to processes. Note: We refer to the mapping as being from tasks to processes, as opposed to processors. This is because typical programming APIs, as we shall see, do not allow easy binding of tasks to physical processors. Rather, we aggregate tasks into processes and rely on the system to map these processes to physical processors. We use processes, not in the UNIX sense of a process, rather, simply as a collection of tasks and associated data. 59 Processes and Mapping ? Appropriate mapping of tasks to processes is critical to the parallel performance of an algorithm. ? Mappings are determined by both the task dependency and task interaction graphs. ? Task dependency graphs can be used to ensure that work is equally spread across all processes at any point (minimum idling and optimal load balance). ? Task interaction graphs can be used to make sure that processes need minimum interaction with other processes (minimum munication). 60 Processes and Mapping An appropriate mapping must minimize parallel execution time by: ? Mapping independent tasks to different processes. ? Assigning tasks on critical path to processes as soon as they bee available. ? Minimizing interaction between processes by mapping tasks with dense interactions to the same process. Note: These criteria often conflict with each other. For example, a deposition into one task (or no deposition at all) minimizes interaction but does not result in a speedup at all! 61 Processes and Mapping: Example Mapping tasks in the database query deposition to processes. These mappings were arrived at by viewing the dependency graph in terms of levels (no two nodes in a level have dependencies). Tasks within a single level are then assigned to different processes. 62 Algorithms and Concurrency ? Introductio