Implementation and Performance Evaluation of Fuzzy File Block Matching
In 2007 USENIX Annual Technical Conference, June 2007.
Bo Han and Pete Keleher
Abstract:
The fuzzy file block matching technique (fuzzy matching for short), was
first proposed for opportunistic use of Content Addressable Storage. Fuzzy
matching aims to increase the hit ratio in the content-addressable storage
providers, and thus can improve the performance of underlying distributed
file storage systems. In particular, fuzzy matching can potentially save
significant network bandwidth and reduce file transmission costs. Fuzzy
matching employs shingling to represent the fuzzy hashing of file
blocks for similarity detection, and error-correcting information to
reconstruct the canonical content of a file block from some similar
blocks. In this paper, we present the implementation details of fuzzy
matching and a very basic evaluation of its performance. In particular, we
show that fuzzy matching can recover new versions of GNU Emacs source from
older ones.
@inProceedings{usenix07,
title = "Implementation and Performance Evaluation of Fuzzy File Block Matching",
author = "Bo Han and Pete Keleher",
booktitle = {2007 USENIX Annual Technical Conference},
month = {June},
year = {2007},
}
Available: bibtex, abstract,
Edit