Where academic tradition
meets the exciting future

Private Reliability Environments for Efficient Fault-Tolerance in CGRAs

Syed M. A. H. Jafri, Stanislaw Piestrak, Ahmed Hemani, Kolin Paul, Juha Plosila, Hannu Tenhunen, Private Reliability Environments for Efficient Fault-Tolerance in CGRAs. Design Automation for Embedded Systems 10, 1–33, 2014.

http://dx.doi.org/10.1007/s10617-014-9129-6

Abstract:

In the era of platforms hosting multiple applications with variable reliability
needs, worst-case platform-wide fault-tolerance decisions are neither optimal nor desirable.
As a solution to this problem, designs commonly employ adaptive fault-tolerance strategies that provide each application with the reliability level actually needed. However, in the
CGRA domain, the existing schemes either only allow to shift between different levels of
modular redundancy (duplication, triplication, etc.) or protect only a particular region of a
device (e.g. configuration memory, computation, or data memory). To complement these
strategies, we propose private fault-tolerance environments which, in addition to modular
redundancy, also provide low cost sub-modular (e.g. residue mod 3) redundancy capable of
handling both permanent and temporary faults in configuration memory, computation, communication, and data memory. In addition, we also present adaptive configuration scrubbing
techniques which prevent fault accumulation in the configuration memory. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) show that the approach proposed is capable of providing flexible protection with energy overhead
ranging from 3.125 % to 107 % for different reliability levels. Synthesis results have con-
firmed that the proposed architecture reduces the area overhead for self-checking (58 %)
and fault-tolerant (7.1 %) versions, compared to the state of the art adaptive reliability techniques.

BibTeX entry:

@ARTICLE{jJaPiHePaPlTe14a,
  title = {Private Reliability Environments for Efficient Fault-Tolerance in CGRAs},
  author = {Jafri, Syed M. A. H. and Piestrak, Stanislaw and Hemani, Ahmed and Paul, Kolin and Plosila, Juha and Tenhunen, Hannu},
  journal = {Design Automation for Embedded Systems},
  volume = {10},
  publisher = {Springer},
  pages = {1–33},
  year = {2014},
  keywords = {Fault-tolerance, Reliability, Adaptive systems, Energy-aware systems, Scrubbing, Reconfiguration, Coarse grained reconfigurable arrays},
}

Belongs to TUCS Research Unit(s): Embedded Computer and Electronic Systems (ECES)

Publication Forum rating of this publication: level 1

Edit publication