Establishing the Parameters of a Decentralized Neural Machine Learning Model
Resumo
The decentralized machine learning models face a bottleneck of high-cost communication. Trade-offs between communication and accuracy in decentralized learning have been addressed by theoretical approaches. Here we propose a new practical model that performs several local training operations before a communication round, choosing among several options. We show how to determine a configuration that dramatically reduces the communication burden between participant hosts, with a reduction in communication practice showing robust and accurate results both to IID and NON-IID data distributions.
Referências
Han, Y., Özgür, A., and Weissman, T. (2021). Geometric lower bounds for distributed parameter estimation under communication constraints. IEEE Transactions on Information Theory, 67(12):8248-8263.
Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al. (2021). Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1-2):1-210.
Koloskova, A., Stich, S., and Jaggi, M. (2019). Decentralized stochastic optimization and gossip algorithms with compressed communication. In International Conference on Machine Learning, pages 3478-3487. PMLR.
Konečny, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. (2016). Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492.
LeCun, Y. (1998). The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/.
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2018). Federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127.
Lian, X., Zhang, C., Zhang, H., Hsieh, C.-J., Zhang, W., and Liu, J. (2017). Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. Advances in Neural Information Processing Systems, 30.
Lin, T., Stich, S. U., Patel, K. K., and Jaggi, M. (2018). Don't use large mini-batches, use local sgd. arXiv preprint arXiv:1808.07217.
McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273-1282. PMLR.
McMahan, H. B., Moore, E., Ramage, D., and y Arcas, B. A. (2016). Federated learning of deep networks using model averaging. arXiv preprint arXiv:1602.05629, 2.
Mello, Luiz E, E. a. (2020). Opening Brazilian COVID-19 patient data to support world research on pandemics. Preprint available at https://doi.org/10.5281/zenodo.3966427.
Mohri, M., Sivek, G., and Suresh, A. T. (2019). Agnostic federated learning. arXiv preprint arXiv:1902.00146.
Nedić, A., Olshevsky, A., and Rabbat, M. G. (2018). Network topology and communication-computation tradeoffs in decentralized optimization. Proceedings of the IEEE, 106(5):953-976.
Paullada, A., Raji, I. D., Bender, E. M., Denton, E., and Hanna, A. (2021). Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns, 2(11):100336.
Stich, S. U. (2018). Local sgd converges fast and communicates little. arXiv preprint arXiv:1805.09767.
Sun, C., Shrivastava, A., Singh, S., and Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision, pages 843-852.
Tang, H., Lian, X., Qiu, S., Yuan, L., Zhang, C., Zhang, T., and Liu, J. (2019). Deep-squeeze: Parallel stochastic gradient descent with double-pass error-compensated compression. arXiv preprint arXiv:1907.07346.
Wang, H., Yurochkin, M., Sun, Y., Papailiopoulos, D., and Khazaeni, Y. (2020). Federated learning with matched averaging. arXiv preprint arXiv:2002.06440.
Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K., Hoang, T. N., and Khazaeni, Y. (2019). Bayesian nonparametric federated learning of neural networks. arXiv preprint arXiv:1905.12022.
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107-115.
Zhang, J., De Sa, C., Mitliagkas, I., and Ré, C. (2016). Parallel sgd: When does averaging help? arXiv preprint arXiv:1606.07365.
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., and Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582.