Compiling parallel loops for high performance computers: Partitioning, data assignment, and remapping.
dc.contributor.author | Hudak, David Edward | en_US |
dc.contributor.advisor | Abraham, Santosh G. | en_US |
dc.date.accessioned | 2014-02-24T16:31:06Z | |
dc.date.available | 2014-02-24T16:31:06Z | |
dc.date.issued | 1992 | en_US |
dc.identifier.other | (UMI)AAI9226921 | en_US |
dc.identifier.uri | http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:9226921 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/105912 | |
dc.description.abstract | Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and global memory access, has a substantial impact on multiprocessor performance. This thesis develops compile-time techniques to reduce the overhead of interprocessor communication for iterative data-parallel loops. These techniques exploit machine-specific information to minimize communication overhead, thus eliminating the need for a user to tune a program for each new multiprocessor. Such techniques are a necessary step toward developing software to support portable parallel programs. Adaptive Data Partitioning (ADP) reduces the execution time of parallel programs by minimizing interprocessor communication for iterative data-parallel loops with near-neighbor communication. On many multiprocessors, the location of data in memory may be specified independently of the loop partition. Data placement schemes are presented that minimize communication time. Under the loop partition specified by ADP, global data is partitioned into classes for each processor. Each processor is able to cache certain global data based on its classification. Compilers must frequently evaluate machine-specific tradeoffs between load imbalance and communication. Optimum cyclic partitions are generated for loops with either a linearly varying or uniform computational structure and either neighborhood or dimensional multicast communication patterns. The CPR (Collective Partitioning and Remapping) algorithm partitions a collection of loops with various computational structures and communication patterns. Experiments that demonstrate the advantage of ADP, data placement, cyclic partitioning and CPR were conducted on the Encore Multimax and BBN TC2000 multiprocessors using the ADAPT system, a program partitioner which automatically restructures iterative parallel loops. | en_US |
dc.format.extent | 166 p. | en_US |
dc.subject | Computer Science | en_US |
dc.title | Compiling parallel loops for high performance computers: Partitioning, data assignment, and remapping. | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Computer Science and Engineering | en_US |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/105912/1/9226921.pdf | |
dc.description.filedescription | Description of 9226921.pdf : Restricted to UM users only. | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.