Low Power Reliable Real-Time Embedded Systems

 
Due to the proliferation of battery-powered embedded computing devices (e.g., PDAs, cell phones and laptops), energy has become the first-class resource in computing systems and power-aware computing has become an important research area. Despite the recent progress in this area,  a number of serious challenges remain to be addressed as the technology size shrinks and transient faults become more prominent. For autonomous critical real-time embedded applications, such as satellite and surveillance systems, where system reliability is as important as energy efficiency, managing energy consumption while preserving system reliability is desired. The problem becomes more complicated when considering the relationship between fault rate, temperature (i.e., thermal effects), cosmic ray radiations and energy management techniques.
  • Effects of Energy Management Techniques on Transient Faults

    As the first step, we have investigated the effects of frequency and voltage scaling (which is one widely exploited energy management technique) on the fault rate. Specifically, based on previously published data regarding the relationship between transient fault rates, critical charge, supply voltages and the number of particles in the cosmic rays, two fault rate models (one linear model and one exponential model) have been proposed/studied. See the following paper for more details. 

The problem becomes more interesting when considering the thermal effects on energy savings, and deserves more investigation.

  • Scheduling in Real-Time Systems for Both Low Power and Reliability

While the slack time in real-time systems can be used by energy management schemes for saving more energy, it can also be exploited as temporal redundancy for fault tolerance. Considering the effects of energy management on transient faults, reliability-ignorant energy schemes may lead to dramatically reduced and unsatisfied system reliability. Therefore, how to preserve system reliability while exploring slack for energy management is an important and interesting problem. Based on the idea of reserving slack for possible recovery to preserve reliability, we have studied one reliability-aware energy management scheme for real-time embedded systems. See following paper for more details.

For systems with multiple periodic tasks and tasks with different workload characteristics, we have extended the reliability-aware energy management schemes by considering more than one task at a time for more energy savings. See the following papers for more details.

  • CMP-EMPERY: Exploiting Chip-Multiprocessor (CMP) in Real-Time EMbedded Systems for Performance, Energy and Reliability

As an alternative power efficient architecture, CMP has been proposed to improve system performance with multiple simpler processing cores on a single chip, where each core may have multiple thread running contexts with simultaneous multithreading (SMT) techniques. The central idea for CMP/SMT to improve system performance and power efficiency is to exploit both instruction- and thread-level parallelisms in the applications.  Considering the inherent redundant structures, CMPs provide great opportunity and flexibility for designing systems with high performance, energy efficient and high reliability.

As the first step, we have proposed the idea of process-level duplication (PLD) in comparison to thread-level duplication (TLD, which is normally used for fault tolerance in CMP/SMT-based systems) for the tradeoff between energy consumption and system reliability. More comprehensive study is on the way.

Last updated: 07/05/2007 04:21:46 PM