Abstract: This paper presents some ideas on the implementation of a compiler which uses the knowledge of the finite geometry that underlies the novel parallel hardware architecture for sparse matrix computations [KAR90]. The compiler takes a data flow graph as input, rearranges it to avoid memory switch, and processors conflicts, and schedules operations to maximize the efficiency of the parallel hardware. The action of the compiler can be viewed as a discrete-time-driven simulation of the execution of the data flow graph on the parallel machine, the simulation capturing the state of the hardware at a particular time instant. The paper presents extensive simulation results for matrix-vector multiply routines on the parallel hardware. The matrices have been chosen from diverse LP applications as well as other scientific computations. The results indicate that uniformly high efficiency (above 90%) is achievable on problems with regular as well as arbitrary structure.
- Performance Analysis of a Proposed Parallel Architecture on Matrix Vector Multiply Like Routines (pdf)
I. Dhillon, N. Karmarkar, K. Ramakrishnan.
Technical Memorandum 11216-901004-13TM, 1990.