Chen, Shi (2008) A vector floating point processing unit design. Masters thesis, Memorial University of Newfoundland.
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
The main contribution of this thesis is the successful development of a vector floating point processing unit for high accuracy science computing. For these numerically-intensive applications, vector processing offers simple and straightforward parallelism by executing mathematical operations on multiple data elements simultaneously. The simple control and datapath structures of vector processing enable the embedded computing system to attain high performance at low power. -- This vector floating point processing unit includes: a vector register file, vector floating point arithmetic units, and vector memory units. The central module, a vector register file, is divided into twelve lanes. One lane contains 16 vector registers, each including 32x32-bit elements, and is connected to a floating point adder and a floating point multiplier. By modeling the multi-port register file using configurable block RAM on Field Programmable Gate Arrays (FPGA) target, vector register files can efficiently obtain data from external memory and feed data to different arithmetic units simultaneously. Utilizing the quick carry out path and embedded multiplier macro unit, the vector floating point arithmetic units can run at over 200 MHz. A flag register is used to indicate the calculation sequence for the specific computing model. Moreover, the embedded Power PC processor not only can easily control the calculation flow, but also can support an embedded operating system to extend a broad range of applications. The prototype is implemented on Xilinx Virtex II Pro devices, and a peak performance of 4.530 GFLOPS at 188.768 MHz has been achieved. -- First, we present a brief introduction to the floating point arithmetic operations, including addition, multiplication, and multiplier-adder-fused. Second, the architecture of the vector processing unit and a detailed description of vector function units are introduced. Moreover, for a specific computing application, the appropriate overlap execution scheme is discussed. In the end, the performance of each component is analyzed, and the time and area analysis of whole system is provided.
|Item Type:||Thesis (Masters)|
|Additional Information:||Includes bibliographical references (leaves 101-104)|
|Department(s):||Engineering and Applied Science, Faculty of|
|Library of Congress Subject Heading:||Field programmable gate arrays; Floating-point arithmetic; Vector processing (Computer science)|
Actions (login required)