Parallelizing Git Checkout: a Case Study of I/O Parallelism

  • Matheus Tavares Bernardino USP
  • Alfredo Goldman USP

Resumo


Version control systems (VCS) are tools used to track and manage the changes made to a set of files over time. Among the VCS tools available today, Git has become the most popular for software development. Being used in small personal projects of a few megabytes and massive corporate repositories with more than 300 GB and 3.5 million files, speed and scalability are among the top priorities for the tool. However, its performance sometimes falls short of what is desired on networked file systems (e.g. NFS), where input and output (I/O) operations tend to be more costly. In particular, that is the case for the checkout command, which is responsible for restoring files from specific versions of a project. Despite the optimizations implemented over the years, the sequential processing of files still carried a large time penalty for NFS, as well as being suboptimal for local file systems on SSDs. In this project, we worked to parallelize the Git checkout machinery, resulting in speedups of up to 4.5x on NFS and 3.6x on SSDs. We also studied how parallelism affects the I/O requests performed by checkout on different storage systems. The optimization was submitted upstream and made available to all Git users starting at version 2.32.0, from June 2021.
Palavras-chave: parallel programming, git, version control systems, Network File System, parallel I/O
Publicado
02/11/2022
BERNARDINO, Matheus Tavares; GOLDMAN, Alfredo. Parallelizing Git Checkout: a Case Study of I/O Parallelism. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 34. , 2022, Bordeaux/France. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 293-304.