Consenso Genérico em Sistemas Dinâmicos com Memória Compartilhada

Cátia Khouri; Fabíola Greve

doi:10.5753/wtf.2015.22938

Cátia Khouri UFBA / UESB
Fabíola Greve UFBA

DOI: https://doi.org/10.5753/wtf.2015.22938

Resumo

O consenso é um serviço fundamental para o desenvolvimento de aplicações confiáveis sobre sistemas distribuídos dinâmicos. Diferentemente de sistemas estáticos, em tais sistemas o conjunto de participantes é desconhecido e varia ao longo da execução. Neste artigo, apresentamos um consenso genérico para o modelo de memória compartilhada, sujeito a falhas por parada, com duas características inovadoras: ele não pressupõe o conhecimento da cardinalidade do conjunto de processos em execução e suporta tanto o uso de detectores de falhas quanto de líder.

Referências

Aguilera, M. K. (2004). A pleasant stroll through the land of infinitely many creatures. SIGACT News, 35(2):36–59.

Aguilera, M. K., Englert, B., and Gafni, E. (2003). On using network attached disks as shared memory. In Proceedings of the twenty-second annual symposium on Principles of distributed computing, PODC ’03, pages 315–324, New York, NY, USA. ACM.

Chandra, T. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267.

Chandra, T. D., Hadzilacos, V., and Toueg, S. (1996). The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685–722.

Chockler, G. and Malkhi, D. (2002). Active disk paxos with infinitely many processes. In Proceedings of the twenty-first annual symposium on Principles of distributed computing, PODC ’02, pages 78–87, New York, NY, USA. ACM.

Delporte-Gallet, C. and Fauconnier, H. (2009). Two consensus algorithms with atomic registers and failure detector omega. In Proceedings of the 10th International Conference on Distributed Computing and Networking, ICDCN ’09, pages 251–262, Berlin, Heidelberg. Springer-Verlag.

Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Kouznetsov, P., and Toueg, S. (2004). The weakest failure detectors to solve certain fundamental problems in distributed computing. In Proc.s of the XXIII Symp. on Principles of Distributed Computing, pages 338–346, St. John’s, Canada.

Fischer, M. J., Lynch, N. A., and Paterson, M. D. (1985). Impossibility of distributed consensus with one faulty process. Journal of ACM, 32(2):374–382.

Gafni, E. and Lamport, L. (2003). Disk paxos. Distrib. Comput., 16(1):1–20.

Goodson, G. R., Wylie, J. J., Ganger, G. R., and Reiter, M. K. (2004). Efficient byzantine-tolerant erasure-coded storage. In Proc. of the Int. Conf. on Dependable Systems and Networks, DSN ’04, pages 135–144. IEEE Computer Society.

Guerraoui, R. and Raynal, M. (2003). The information structure of indulgent consensus. IEEE Transactions on Computers, 53:2004.

Guerraoui, R. and Raynal, M. (2007). The alpha of indulgent consensus. Comput. J., 50(1):53–67.

Herlihy, M. and Luchangco, V. (2008). Distributed computing and the multicore revolution. SIGACT News, 39(1):62–72.

Keidar, I. and Rajsbaum, S. (2001). On the cost of fault-tolerant consensus when there are no faults - a tutorial. Technical report.

Khouri, C. and Greve, F. (2013). Algoritmo de consenso genérico em memória compartilhada. XIV WTF-SBRC, Brası́lia, DF. SBC.

Khouri, C., Greve, F., and Tixeuil, S. (2013). Consensus with unknown participants in shared memory. In Reliable Distributed Systems (SRDS), 2013 IEEE 32nd International Symposium on, pages 51–60.

Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. (2000). Oceanstore: an architecture for global-scale persistent storage. SIGPLAN Not., 35(11):190–201.

Lamport, L. (1986). On interprocess communication. Distributed Computing, 1(2):77–101.

Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2):133–169.

Lo, W.-K. and Hadzilacos, V. (1994). Using failure detectors to solve consensus in asynchronous shared-memory systems (extended abstract). In Proceedings of the 8th International Workshop on Distributed Algorithms, WDAG ’94, pages 280–295, London, UK. Springer-Verlag.

Malkhi, D. and Reiter, M. K. (2000). An architecture for survivable coordination in large distributed systems. IEEE Trans. on Knowl. and Data Eng., 12(2):187–202.

Neiger, G. (1995). Failure detectors and the wait-free hierarchy (extended abstract). In 14th ACM symp. on Principles of distributed computing, PODC ’95, pages 100–109, New York, NY, USA. ACM.

Rodrigues, R. and Liskov, B. (2004). Rosebud: A scalable byzantine-fault-tolerant storage architecture. Technical report, MIT-LCS-TR-932, MIT Laboratory for Computer Science.

TOP500 (2014). Top500 list. Disponı́vel em http://www.top500.org/lists/2014/11/.

Zhou, L., Schneider, F. B., and Van Renesse, R. (2002). Coca: A secure distributed online certification authority. ACM Trans. Comput. Syst., 20(4):329–368.