TUCS Publication Database: Parallel Distributed Scalable Runtime Address Generation Scheme for a Coarse Grain Reconfigurable Computation and Storage Fabric

Parallel Distributed Scalable Runtime Address Generation Scheme for a Coarse Grain Reconfigurable Computation and Storage Fabric

Nasim Farahini, Ahmed Hemani, Hassan Sohofi, Syed M. A. H. jafri, Muhammad Adeel Tajammul, Kolin Paul, Parallel Distributed Scalable Runtime Address Generation Scheme for a Coarse Grain Reconfigurable Computation and Storage Fabric. Microprocessors and Microsystems , 1–15, 2014.

http://dx.doi.org/10.1016/j.micpro.2014.05.009

Abstract:

This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconﬁgurable computation and storage fabric. The scheme can also deal with non-afﬁne functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The ﬁrst category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is
relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to conﬁgure the AGUs, transformation of non-affine functions to afﬁne function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconﬁguration agility and energy compared to the prevalent pre-computation of
address constraints. The efﬁcacy of the proposed method has been validated against the prevalent
address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance
improvement.

BibTeX entry:

@ARTICLE{jFaHeSojaTaPa14a,
  title = {Parallel Distributed Scalable Runtime Address Generation Scheme for a Coarse Grain Reconfigurable Computation and Storage Fabric},
  author = {Farahini, Nasim and Hemani, Ahmed and Sohofi, Hassan and jafri, Syed M. A. H. and Tajammul, Muhammad Adeel and Paul, Kolin},
  journal = {Microprocessors and Microsystems},
  publisher = {Springer},
  pages = {1–15},
  year = {2014},
  ISSN = {0141-9331},
}

Belongs to TUCS Research Unit(s): Embedded Computer and Electronic Systems (ECES)

Publication Forum rating of this publication: level 1

Edit publication

Where academic tradition meets the exciting future

Parallel Distributed Scalable Runtime Address Generation Scheme for a Coarse Grain Reconfigurable Computation and Storage Fabric

Abstract:

BibTeX entry:

Where academic tradition
meets the exciting future