Proposal for comparison and measurement of parallel and distributed file systems for training ML models in the healthcare
Abstract
Many scientific fields are increasingly relying on high-performance computing (HPC) to handle and analyze vast amounts of experimental data. At the same time, storage systems in modern HPC environments must adapt to different access patterns. These patterns involve frequent metadata operations, numerous small I/O requests, and randomized file access, whereas traditional parallel file systems have been optimized primarily for sequential and shared access to large files. In this research, we will compare GekkoFS and evaluate its performance against Lustre, a widely used parallel file system that meets the demanding requirements of HPC simulation environments. Our comparison aims to highlight the strengths and limitations of each system for training ML models in healthcare.
References
Dos Reis, M. A., Kunas, C. A., da Silva Araújo, T., Schneiders, J., de Azevedo, P. B., Nakayama, L. F., Rados, D. R., Umpierre, R. N., Berwanger, O., Lavinsky, D., et al. (2024). Advancing healthcare with artificial intelligence: diagnostic accuracy of machine learning algorithm in diagnosis of diabetic retinopathy in the brazilian population. Diabetology & Metabolic Syndrome, 16(1):209.
Gupta, A., Dhakshinamoorthy, D., and Paul, A. K. (2024). Studying the effects of asynchronous i/o on hpc i/o patterns. In 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), pages 109–112. IEEE.
Macedo, R., Miranda, M., Tanimura, Y., Haga, J., Ruhela, A., Harrell, S. L., Evans, R. T., Pereira, J., and Paulo, J. (2023). Taming metadata-intensive hpc jobs through dynamic, application-agnostic qos control. In 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pages 47–61. IEEE.
Samsi, S., Zhao, D., McDonald, J., Li, B., Michaleas, A., Jones, M., Bergeron, W., Kepner, J., Tiwari, D., and Gadepally, V. (2023). From words to watts: Benchmarking the energy costs of large language model inference. In 2023 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–9. IEEE.
Vef, M.-A., Moti, N., Suß, T., Tacke, M., Tocci, T., Nou, R., Miranda, A., Cortes, T., and Brinkmann, A. (2020). Gekkofs—a temporary burst buffer file system for hpc applications. Journal of Computer Science and Technology, 35:72–91.
