Principles of Robust Cooperative Computing

NSF-NATO Grant

Alex Shvartsman

Dariusz Kowalski

Cooperatively performing a set of tasks in a decentralized setting is a fundamental problem in distributed computing. This is often challenging because the set of processors available to the computation and their ability to communicate may dynamically change due to perturbations in the computation medium. An abstract statement of this problem, referred to as the Do-All problem --- P fault-prone processors perform N independent tasks --- is one of the standard problems in the research on the complexity of fault-tolerant distributed computation. This problem has been studied in a variety of settings, e.g., in shared-memory models (Write-All) in message-passing models, and in partitionable networks (Omni-Do). Solutions for Do-All must perform all tasks efficiently in the presence of specific failure patterns or asynchrony. The efficiency is assessed in terms of work, time and communication complexity depending on the specific model of computation.

In this project we aim to advance the state-of-the-art in principles of robust cooperative computing by addressing several specific problem areas. (a) We assess the trade-offs between work and communication inherent to the Do-All problem. (b) We attempt to improve the upper bounds on work and communication for Do-All in message-passing and shared-memory settings subject to explcit assumptions about the failures and asynchrony. (c) In general we aim to narrow the gap between the upper and lower bounds on work of Do-All algorithms in specific message-passing and shared-memory models. The existing gap is substantial in some cases. (d) We consider specifying algoritmic solutions for Do-All as distributed system services and we will reason formally about such specifications and about applications that use such services.