Principles and Practices of Dependable Distributed Computing
NSF Career Award
This research will advance the theoretical foundations and explore practical implementations of dependable distributed system technology. A distributed system is dependable, when it provides guarantees regarding its performance, fault- tolerance, correctness and compositionality. The research objectives will be achieved through synergy between the research in distributed systems with its focus on fault-tolerance and correctness, the research in parallel computing with its focus on speed-up and efficiency, and the practical engineering considerations of specification, development, deployment and performance of systems. This proposal envelops three investigation areas:
(1) Robust Algorithmics: Development of fault-tolerant and efficient distributed algorithms and exploration of limitations on achieving robustness in distributed computing.
(2) Building Blocks: Definition and analysis of dependable distributed building blocks needed by applications requiring precise guarantees; and design of specification frameworks for capturing designs and optimizing distributed system deployment.
(3) Distributed Implementation: Development of exploratory implementations of compositional building blocks and robust algorithms, and evaluation of their performance in realistic and simulated settings; empirical evaluations will complement the analytically established efficiency characterizations.
The educational component includes: developing and delivering new courses in distributed computing in support of undergraduate and graduate programs in computer science; and, building a research group that attracts graduate students and postdoctoral researchers.