Number-On-Forehead Communication Complexity of Data Clustering with Sunflowers
Resumo
We study the problem of performing data clustering in a distributed setting, which is a problem that may arise in many practical areas such as machine learning and data analysis. The way in which the sites communicate and the way data is allocated define a model of communication. We develop a protocol to compute distributed clustering in the Number on Forehead model of communication complexity. In our model, we requiere that each site is aware of all clusters in its own data and all data allocated among sites define a sunflower. We show that there exists a two round communication protocol for data clustering where each site knows an to all clusters.
Referências
Chen, J., Sun, H., Woodruff, D., and Zhang, Q. (2016). Communication-optimal dis- tributed clustering. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), pages 3727–3735.
Deza, M. (1974). Solution d’un probl`eme de Erd¨os-Lov´asz. Journal of Combinatorial Theory, Series B, 16(2):166–167.
Erd¨os, P., Chao, and Rado, R. (1961). Intersection theorems for systems op finite sets. Quarterly Journal ofMathematics, 12(1):313–320.
H˚astad, J. and Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1(2):113–129.
Kostochka, A. (2000). Extremal problems on ∆-systems. Numbers, Information and Complexity, pages 143–150.
Kushilevitz, E. and Nisan, N. (2006). Communication complexity. Cambridge University Press.
Lee, Y. T. and Sun, H. (2018). Constructing linear-sized spectral sparsification in almost- linear time. SIAM Journal on Computing, 47(6):2315–2336.
von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4):395–416.