A gentle introduction to the erasure coding
The erasure coding is currently a very hot topic for distributed storage systems. It has been part of the Ceph roadmap for almost a year and Swift guys recently brought the discussion to the table. Both of them have planned to implement the erase code functionality so I thought it might be interesting to give a high level overview about the erasure coding principles. Before we start, I’d like to point out that I don’t take any credit for this article. I just read the wonderful white paper “Erasure Coding vs. Replication: A Quantitative Comparison” written by Hakim Weatherspoon and John D. Kubiatowicz from Computer Science Division University of California, Berkeley. Many thanks to them. While ready the paper, I found their introduction to the erasure code easy to understand for a novice like me :).
I. Introduction ¶
An erasure code provides redundancy without the overhead of strict repli-cation. Erasure codes divide an object into m
fragments and recode them into n
fragments, where n
> m
. We call r = m/n < 1
the rate of encoding.
A rate r
code increases the storage cost by a factor of 1/r
. The key property of erasure codes is that the original object can be reconstructed from any m
fragments. For example, using an r = 1/4
encoding on a block divides the block into m = 16
fragments and encodes the original m
fragments into n = 64
fragments; in-creasing the storage cost by a factor of four. Erasure codes are a superset of replicated and RAID systems. For example, a system that creates four replicas for each block can be described by an (m = 1
, n = 4
) erasure code. RAID level 1, 4, and 5 can be described by an (m = 1
, n = 2
, (m = 4
, n = 5) and (
m = 4,
n = 5) erasure code, respectfully.
II. Data Integrity ¶
Erasure coding in a malicious environment requires the precise identification of failed or corrupted fragments. Without the ability to identify try to reconstruct the block; that is, (n, m)
combinations. As a result, the system corrupted fragments, there is potentially a factorial combination of fragments to needs to detect when a fragment has been corrupted and discard it. A secure ver- ification hashing scheme can serve the dual purpose of identifying and verifying each fragment. It is necessarily the case that any m
correctly verified fragments can be used to reconstruct the block. Such a scheme is likely to increase the bandwidth and storage requirements, but can be shown to still be many times less than replication.
Simple isn’t?