TY - GEN
T1 - Adaptive failover for real-time middleware with passive replication
AU - Balasubramanian, Jaiganesh
AU - Tambe, Sumant
AU - Lu, Chenyang
AU - Gokhale, Aniruddha
AU - Gill, Christopher
AU - Schmidt, Douglas C.
PY - 2009
Y1 - 2009
N2 - Supporting uninterrupted services for distributed soft real-time applications is hard in resource-constrained and dynamic environments, where processor or process failures and system workload changes are common. Fault-tolerant middleware for these applications must achieve high service availability and satisfactory response times for client applications.Although passive replication is a promising fault tolerance strategy for resource-constrained systems, conventional client failover approaches are non-adaptive and load-agnostic, which can cause system overloads and significantly increase response times after failure recovery. This paper presents four contributions to the study of passive replication for distributed soft real-time applications. First, it describes how our Fault-tolerant Loadaware and Adaptive middlewaRe (FLARe) dynamically adjusts failover targets at runtime in response to system load fluctuations and resource availability. Second, it describes how FLARe's overload management strategy proactively enforces desired CPU utilization bounds by redirecting clients from overloaded processors. Third, it presents the design and implementation of FLARe's lightweight middleware architecture that manages failures and overloads transparently to clients. Finally, it presents experimental results on a distributed Linux testbed that demonstrate how FLARe adaptively maintains soft real-time performance for clients operating in the presence of failures and overloads with negligible runtime overhead.
AB - Supporting uninterrupted services for distributed soft real-time applications is hard in resource-constrained and dynamic environments, where processor or process failures and system workload changes are common. Fault-tolerant middleware for these applications must achieve high service availability and satisfactory response times for client applications.Although passive replication is a promising fault tolerance strategy for resource-constrained systems, conventional client failover approaches are non-adaptive and load-agnostic, which can cause system overloads and significantly increase response times after failure recovery. This paper presents four contributions to the study of passive replication for distributed soft real-time applications. First, it describes how our Fault-tolerant Loadaware and Adaptive middlewaRe (FLARe) dynamically adjusts failover targets at runtime in response to system load fluctuations and resource availability. Second, it describes how FLARe's overload management strategy proactively enforces desired CPU utilization bounds by redirecting clients from overloaded processors. Third, it presents the design and implementation of FLARe's lightweight middleware architecture that manages failures and overloads transparently to clients. Finally, it presents experimental results on a distributed Linux testbed that demonstrate how FLARe adaptively maintains soft real-time performance for clients operating in the presence of failures and overloads with negligible runtime overhead.
UR - https://www.scopus.com/pages/publications/67650266691
U2 - 10.1109/RTAS.2009.36
DO - 10.1109/RTAS.2009.36
M3 - Conference contribution
AN - SCOPUS:67650266691
SN - 9780769536361
T3 - Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS
SP - 118
EP - 127
BT - Proceedings - 15th IEEE Real-Time and Embedded Technology and Application Symposium, RTAS 2009
T2 - 15th IEEE Real-Time and Embedded Technology and Application Symposium, RTAS 2009
Y2 - 14 April 2009 through 16 April 2009
ER -