Scalable, Efficient, and Policy-Aware Deduplication for Primary Distributed Storage Systems
Resumo
Data deduplication has become a crucial technique for reducing data in modern storage systems. We present SEP-D, a practical scale-out distributed storage system to incorporate data deduplication for primary storage. SEP-D introduces a novel metadata handling mechanism which combines content-based hashing with built-in distributed data placement strategies such as CRUSH. This enables SEP-D to eliminate the needs for remote metadata lookups, thus incorporating deduplication without affecting scalability. SEP-D integrates smoothly with the existing storage system, allowing the re-use of storage policies across different pools of storage. We implemented SEP-D in Ceph, a popular distributed storage system widely adopted in the industry, and demonstrated that SEP-D has minimal impact on I/O performance in data while maintaining existing storage policies implemented in underlying distributed storage systems.
Palavras-chave:
Metadata, Semantics, Distributed databases, Servers, Scalability, Throughput, Media
Publicado
15/10/2019
Como Citar
FINGLER, Henrique; RA, Moo-Ryong; PANTA, Rajesh.
Scalable, Efficient, and Policy-Aware Deduplication for Primary Distributed Storage Systems. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 31. , 2019, Campo Grande/MS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2019
.
p. 180-187.
