Due to the proliferation of
battery-powered embedded computing devices (e.g., PDAs, cell phones and
laptops), energy has become the first-class resource in computing
systems and power-aware computing has become an important research area.
Despite the recent progress in this area,
a number of serious challenges remain to be addressed as the technology
size shrinks and transient faults become more prominent. For autonomous critical real-time embedded
applications, such as satellite and surveillance systems, where
system reliability is as important as energy efficiency, managing energy
consumption while preserving system reliability is desired. The problem
becomes more complicated when considering the relationship between fault
rate, temperature (i.e., thermal effects), cosmic ray radiations and
energy management techniques.
- Effects of Energy Management Techniques on Transient Faults
As the first step, we have
investigated the effects of frequency and voltage scaling (which is
one widely exploited energy management technique) on the
fault rate. Specifically, based on previously published data
regarding the relationship between transient fault rates, critical
charge, supply voltages and the number of particles in the cosmic
rays, two fault rate models (one linear model and one exponential
model) have been proposed/studied. See the following paper for more
details.
The problem becomes more interesting when considering the thermal
effects on energy savings, and deserves more investigation.
- Scheduling in Real-Time Systems for Both Low Power and
Reliability
While the slack time in real-time systems can be used by energy
management schemes for saving more energy, it can also be exploited
as temporal redundancy for fault
tolerance. Considering the effects of energy management on transient
faults, reliability-ignorant energy schemes may lead to dramatically
reduced and unsatisfied system reliability. Therefore, how to
preserve system reliability while exploring slack for energy
management is an important and interesting problem. Based on the
idea of reserving slack for possible recovery to preserve
reliability, we have studied one reliability-aware energy management
scheme for real-time embedded systems. See following paper for more
details.
For systems with multiple periodic tasks and tasks with different
workload characteristics, we have extended the reliability-aware
energy management schemes by considering more than one task at a
time for more energy savings. See the following papers for more
details.
- CMP-EMPERY: Exploiting Chip-Multiprocessor (CMP) in
Real-Time EMbedded Systems for Performance, Energy
and Reliability
As an alternative power efficient
architecture, CMP has been proposed to improve system performance
with multiple simpler processing cores on a single chip, where each
core may have multiple thread running contexts with simultaneous
multithreading (SMT) techniques. The central idea for CMP/SMT to
improve system performance and power efficiency is to exploit both
instruction- and thread-level parallelisms in the applications.
Considering the inherent redundant structures, CMPs provide great
opportunity and flexibility for designing systems with high
performance, energy efficient and high reliability.
As the first step, we have proposed the idea of
process-level duplication (PLD) in comparison to thread-level
duplication (TLD, which is normally used for fault tolerance in CMP/SMT-based
systems) for the tradeoff between energy consumption and system
reliability. More comprehensive study is on the way.
|