Modulo scheduling loops onto coarse-grained reconfigurable architectures

Gnanaolivu, Rani (2013) Modulo scheduling loops onto coarse-grained reconfigurable architectures. Doctoral (PhD) thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (8Mb)


Reconfigurable systems have drawn increasing attention from both academic researchers and creators of commercial applications in the past few years because they could combine flexibility with efficiency. There are two main types of reconfigurable architectures - fine-grained and coarse-grained. The functionality of fine-grained architecture hardware is specified at the bit level while the functionality of the coarse-grained architecture hardware is specified at the word level. Coarse-grained reconfigurable architectures (CGRAs) have gained currency in recent years due to their abundant parallelism, high computational intensity and flexibility. A CGRA normally is comprised of an array of basic computational and storage resources, which are capable of processing a large volume of applications simultaneously. To exploit the inherent parallelism in the applications to enhance performance, CGRAs have been structured for accelerating computation intensive parts such as loops, that require large amounts of execution time. The loop body is essentially drawn onto the CGRA mesh, subject to modulo resource usage constraints. Much research has been done to exploit the potential parallelism of CGRAs to increase the performance of time-consuming loops. However, sparse connectivity and distributed register files present difficult challenges to the scheduling phase of the CGRA compilation framework. While traditional schedulers do not take routability into consideration, software pipelining can improve the scheduling of instructions in loops by overlapping instructions from different iterations. Modulo scheduling is an approach for constructing software pipelines that focuses on minimizing the time between the initiations of iterations - the so-called initiation interval (II). For example, if a new iteration is started every II cycles, the time to complete n iterations will approach II x n, for large n loops, thereby maximizing performance. -- The problems of scheduling (deciding when an operation should happen), placing (deciding where an operation should happen), and routing (the problem of how information travels through space and time between operations) can be unified if they are modelled by a graph embedding problem. The data flow graph of the loop is embedded in a routing resource graph representing the hardware across a number of cycles equal to the initiation interval. -- Particle swarm optimization (PSO) has shown to be successful in many applications in continuous optimization problems. In this thesis, we have proposed algorithms to solve scheduling, placing, and routing of loop operations simultaneously by using PSO. We call this approach modulo-constrained hybrid particle swarm optimization (MCHPSO). There are many constraints and one optimization objective, which is the II that needs to be considered during the mapping and scheduling procedure. The scheduling algorithm tries to minimize the initiation interval to start the next iteration of the loop under the resource and modulo constraints for the architecture being used. -- When conditional branches such as if-then-else statements are present in the loop, they create multiple execution paths. Exploiting conditional branches through our predicated exclusivity, the MCHPSO algorithm reuses the resources which are in the exclusive execution paths and which may allow the loop to be scheduled with a lower II. Finally, a priority scheme algorithm along with recurrence aware modulo scheduling is proposed to map inter-iteration dependencies onto CGRAs, which is able to save resources for all recurrences cycles and to map remaining operations.

Item Type: Thesis (Doctoral (PhD))
Item ID: 10555
Additional Information: Includes bibliographical references (leaves 162-177).
Department(s): Engineering and Applied Science, Faculty of
Date: 2013
Date Type: Submission
Library of Congress Subject Heading: Computer architecture; Adaptive computing systems--Design and construction; Parallel processing (Electronic computers)

Actions (login required)

View Item View Item


Downloads per month over the past year

View more statistics