Number-On-Forehead Communication Complexity of Data Clustering with Sunflowers

  • Fabricio Mendoza-Granada UNA
  • Marcos Villagra UNA

Resumo


We study the problem of performing data clustering in a distributed setting, which is a problem that may arise in many practical areas such as machine learning and data analysis. The way in which the sites communicate and the way data is allocated define a model of communication. We develop a protocol to compute distributed clustering in the Number on Forehead model of communication complexity. In our model, we requiere that each site is aware of all clusters in its own data and all data allocated among sites define a sunflower. We show that there exists a two round communication protocol for data clustering where each site knows an to all clusters.

Palavras-chave: data clustering, protocol, Number-on-Forehead model

Referências

Batson, J. D., Spielman, D. A., and Srivastava, N. (2009). Twice-ramanujan sparsifiers. In Proceedings of the 41st annual ACM symposium on Theory of computing (STOC), pages 255–262.

Chen, J., Sun, H., Woodruff, D., and Zhang, Q. (2016). Communication-optimal dis- tributed clustering. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), pages 3727–3735.

Deza, M. (1974). Solution d’un probl`eme de Erd¨os-Lov´asz. Journal of Combinatorial Theory, Series B, 16(2):166–167.

Erd¨os, P., Chao, and Rado, R. (1961). Intersection theorems for systems op finite sets. Quarterly Journal ofMathematics, 12(1):313–320.

H˚astad, J. and Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1(2):113–129.

Kostochka, A. (2000). Extremal problems on ∆-systems. Numbers, Information and Complexity, pages 143–150.
Kushilevitz, E. and Nisan, N. (2006). Communication complexity. Cambridge University Press.

Lee, Y. T. and Sun, H. (2018). Constructing linear-sized spectral sparsification in almost- linear time. SIAM Journal on Computing, 47(6):2315–2336.

von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4):395–416.
Publicado
02/07/2019
MENDOZA-GRANADA, Fabricio; VILLAGRA, Marcos . Number-On-Forehead Communication Complexity of Data Clustering with Sunflowers. In: ENCONTRO DE TEORIA DA COMPUTAÇÃO (ETC), 4. , 2019, Belém. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . ISSN 2595-6116. DOI: https://doi.org/10.5753/etc.2019.6394.