Stacked DRG PoRep #
This section describes Stacked DRG PoRep (SDR), the specific Proof-of-Replication (PoRep) used in Filecoin. In this construction, the prover encodes the original data into a replica and commits to it. An offline PoRep proves that the commitment to the replica is a valid commitment of the encoded original data.
SDR has been presented by Ben Fisch at EUROCRYPT19.
Background on Proof-of-Replication #
Proof-of-Replication enables a prover P to convince a verifier V that P is storing a replica R, a physically independent copy of some data D, unique to P. The scheme is defined by a tuple of polynomial time algorithms (Setup, Replication, Prove, Verify). The assumption is that generation of a replica after Replicate must be difficult (if not impossible) to generate.
Setup: On setup, the public parameters of the proving systems are set.
Replicate: On replication, either a party or both (depending on the scheme, in our case the prover only!) generate a unique permutation of the original data D, which we call replica R.
Prove: On receiving a challenge, the prover must generate a proof that it is in possession of the replica and that it was derived from data D. The prover must only be able to respond to the challenge successfully if it is in possession of the replica, since would be difficult (if not impossible) to generate a replica that can be used to generate the proof at this stage
Verify: On receiving the proof, the verifier checks the validity of the proof and accepts or rejects the proof.
Time-bounded Proof-of-Replication #
Timing assumption. Time-bounded Proof-of-Replication are constructions of PoRep with timing assumptions. The assumption is that generation of the replica (hence the Replication) takes some time t that is substantially larger than the time it takes to produce a proof (hence time(Prove)) and the round-trip time (RTT) for sending a challenge and receiving a proof.
Distinguishing Malicious provers. A malicious prover that does not have R, must obtain it (or generate it), before the Prove step. A verifier can distinguish an honest prover from a malicious prover, since the malicious one will take too long to answer the challenge. A verifier will reject if receiving the proof from the prover takes longer than a timeout (bounded between proving time and replication time).
Background on Stacked DRG PoRep #
Stacked DRG PoRep (SDR) is a specific Proof-of-Replication construction that we use in Filecoin. SDR has been designed by Ben Fisch at EUROCRYPT19. At a high level, SDR ensures that the Replicate step is a slow non-parallelizable sequential process by using a special type of graph called Depth Robust Graphs (DRG).
Encoding using DRGs. A key is generated by sequentially labeling nodes in the graph such that each label depends on the labels of its parents. The depth robustness property of these graphs ensure that the sequential labeling steps are not parallelizable. The final labels are used as a key to encode the original data.
TODO: This probably needs a more thorough rewrite.
Stacked DRGs. SDR builds on the above by stacking DRG graphs into
LAYERS layers. Each layer is connected to the previous by a Bipartite Expander Graph. The combination of DRGs and expander graphs guarantee the security property of PoRep. As before, the key produced by the final layer is used to encode the original data, yielding the replica.
Generating SDR proofs. Given the following public parameters:
ReplicaIdis a unique replica identifier (see the Filecoin Proofs spec for details).
CommDis the Merkle tree root hash of the input data to the first layer.
CommCis the Merkle tree root hash of the SDR column commitments.
CommRLastis the Merkle tree root hash of the replica.
CommRis the on-chain commitment to the replica, dervied as the hash of the concatenation of
An SDR proof proves that some data whose committment is
CommD has been used to run a
Replicate algorithm and generated some data.
CommR is the on-chain commitment to both the replicated data and to intermediate stages required to prove
Replicate was performed correctly.
An SDR proof consists of a set of challenged DRG nodes for each layer, a set of parent nodes for each challenged node and a Merkle tree inclusion proof for each node provided. The verifier can then verify the correct labeling of each node and that the nodes given were consistent with the prover’s commitments.
Making proofs succinct with SNARKs: The proof size in SDR is too large for blockchain usage (~100MB TODO: check this), mostly due to the large amount of Merkle tree inclusion proofs required to achieve security. We use SNARKs to generate a proof of knowledge of a correct SDR proof. In other words, we implement the SDR proof verification algorithm in an arithmetic circuit and use SNARKs to prove that it was evaluated correctly.
The SNARK circuit proves that given Merkle roots
CommR, the prover correctly derived the labels at each layer and correctly performed the final encoding.
PoRep in Filecoin #
Proof-of-Replication proves that a Storage Miner is dedicating unique storage for each sector. Filecoin Storage Miners collect new clients’ data in a sector, run a slow encoding process (called
Seal) and generate a proof (
SealProof) that the encoding was generated correctly.
In Filecoin, PoRep provides two guarantees: (1) space-hardness: Storage Miners cannot lie about the amount of space they are dedicating to Filecoin in order to gain more power in the consensus; (2) replication: Storage Miners are dedicating unique storage for each copy of their clients data.
- sector: a fixed-size block of data of
SECTOR_SIZEbytes which generally contains clients’ data.
- unsealed sector: a concrete representation (on disk or in memory) of a sector’s that follows the “Storage Format” described in
Client Data Processing (currently
paddedfr32v1is the required default).
- sealed sector: a concrete representation (on disk or in memory) of the unique replica generated by
Sealfrom an unsealed sector. A sector contains one or more pieces.
- piece: a block of data of at most
SECTOR_SIZEbytes which is generally a client’s file or part of.
Stacked DRG Construction #
Public Parameters #
The following public parameters are used in the Stacked DRG Replication and Proof Generation algorithms:
The Appendix should explain why we picked those values
Just interpolate a table of the Orient parameters and reconcile naming.
|Number of nodes in the DRG in bytes|
|Number of Depth Robust Graph stacked layers.|
|In-Degree of each Depth Robust Graph.|
|Degree of each Bipartite Expander Graph to extend dependencies between layers.|
|Seed used for random number generation in |
|Size of each node in bytes.|
The following constants are computed from the public parameters:
|Total number of parent nodes|
|Number of nodes in the graph|
|Height of the Merkle Tree of a sector|
The following additional public parameters are required:
Float: Fraction of each layer’s challenges by which to reduce next-lowest layer’s challenge count.
uint: Number of layers which should be tapered. FIXME: update for current tapering.
Datais a byte array initialized to the content of unsealed sector and will be mutated in-place by the replication process.
Hash Functions #
We have describe three hash functions:
|name||description||size of input||size of output||construction|
|Hash function used as a KDF to derive the key used to label a single node.||TODO|
|Hash function used to hash the labeled leaves of each layer (see SDR Column Commitments).||TODO|
|Collision Resistant Hash function used for the Merkle tree.||2 x |
|Balanced binary Merkle tree based used to generate commitments to sealed sectors, unsealed sectors, piece commitments, and intermediate parts of the Proof-of-Replication.||TODO||Uses |
RepHash is a vector commitment used to generate commitments to sealed sectors, unsealed sectors, piece commitments and intermediate stepds of the Proof-of-Replication. Filecoin uses a balanced binary Merkle tree for
RepHash. The leaves of the Merkle tree are pairs of adjacent nodes.
RepHash inputs MUST respect a valid Storage Format. [TODO: What does this mean?]
Stacked DRG Graph #
The slow sequential encoding required is enforced by the depth robustness property of the SDR graph.
Encoding with SDR: The data from a sector (of size
SECTOR_SIZE) is divided in
NODE_SIZE nodes (for a total of
GRAPH_SIZE nodes) and arranged in a directed acyclic graph. The structure of the graph is used to label the nodes sequentially to generate a key with which to encode the original data: in order to label a node, its parents must be labeled (see the “Layer Labeling” section below). We repeat this process for
LAYERS layers, where the input to a next layer is the output of the previous one.
Generating the SDR graph: The SDR graph is divided in
LAYERS layers. Each layer is a directed acyclic graph and it combines a Depth Robust Graph (DRG) and a Bipartite Expander graph.
this isn’t quite right.
We provide an algorithm (
SDR) which computes the parents of a node. In high level, the parents of a node are computed by combining two algorithms: some parents (
BASE_DEGREE of them) are computed via the
BucketSample algorithm extended with a direct ordering of nodes, others (
EXPANSION_DEGREE of them) are computed via the
SDRGraph: SDR Graph algorithm
Overview: Compute the DRG and Bipartite Expander parents using respectively
|The node for which the parents are being computed|
|The layer of the SDR graph|
|The ordered parents of node |
layer = 1:
drgParents = BucketSample(node)
layer > 1:
drgParents = BucketSample(node)
expanderParents = ChungExpander(node)
parentsto be the concatenation of
We provide below a more succinct representation of the algorithm:
TODO: Reference to code in filproofs/algorithms.go — or restructure all this.
Time-space tradeoff #
Computing the parents using both
ChungExpander for every layer can be an expensive operation, however, this can be avoided by caching the parents.
BucketSample: Depth Robust Graphs algorithm
This section describes how to compute the “base parents” of the SDR graph, which is the equivalent of computing the parents of a Depth Robust Graph.
The properties of DRG graphs guarantee that a sector has been encoded with a slow, non-parallelizable process. We use the
BucketSample algorithm that is based on DRSample (
ABH17) and described in
FBGB18 and generates a directed acyclic graph of in-degree
BucketSample DRG graphs are random graphs that can be deterministically generated from a seed; different seeds lead with high probability to different graphs. In SDR, we use the same seed
GRAPH_SEED for each layer of the SDR graph such that they are all based on the same underlying DRG graph.
The parents of any node can be locally computed without computing the entire graph. We call the parents of a node calculated in this way base parents.
BucketSampleInner to include the node’s ‘immediate predecessor’. Each node except the first in a
DRG generated by
BucketSample has the node whose index is one less than its own as a parent. This ensures that
visiting nodes whose indexes are sequential will result in a graph traversal in topological order.
ChungExpander: Bipartite Expander Graphs
TODO: explain why we link nodes in the current layer
Each node in layers other than the first has
EXPANSION_DEGREE parents generated via the
algorithm. Note that the indexes returned refer to labels from the previous layer. TODO: Make this all clearer with explicit notation.
TODO: link to relevant filproofs/algorithms.go
Time-Space tradeoff #
Computing these parents can be expensive (especially due to the hashing required by the Feistel algorithm). A miner can trade this computation by storing the expansion parents.
Feistel construction #
We use three rounds of
Feistel to generate a permutation to compute the parents of the Bipartite Expander graph.
TODO: link to filproofs/feistel
The Replication phase turns an unsealed sector into a sealed sector by first generating a key, then using the key to encode the orignal data.
Before running the
Replicate algorithm, the prover must ensure that the sector is correctly formatted with a valid “Storage Format” described in
Filecoin Client Data Processing (currently
paddedfr32v1 is the required default).
TODO: inputs are missing
The Replication Algorithm proceeds as follows:
ReplicaID is a 32-byte array constructed by hashing the concatenation of the following values:
ProverIdis a 32-byte array uniquely identifying a prover.
SectorNumberis an unsigned 64-bit integer in little-endian encoding represented as an 8-byte array.
RandomSeedis a 32-byte array of randomness extracted from the chain.
CommDis the Merkle root obtained by performing
RepHashon the original data represented in
ReplicaID := Hash(ProverID || SectorNumber || RandomSeed || CommD)
CommD, TreeD = RepHash(data)
Layer Labeling #
Graph. We need to decide if this is an object we’ll explicitly define or if its properties (e.g.,
GRAPH_SIZE) are just part of the replication parameters and all the functions just refer to the same graphs being manipulated across the entire replication process. (At the moment I’ve avoided defining a
Graphstructure as in other specs I didn’t see any object methods, just standalone functions.)
TODO: link to filproofs/algorithms
Proof Generation #
- Challenge Derivation
- Proof Generation
- Circuit Proof Generation
TODO: write a single algorithm which includes the spec below
Challenge Generation #
TODO: Link to filproofs/algorithms
LAYER_CHALLENGES : [LAYERS]uint: Number of challenges per layer. (This will be passed to the SDR circuit proof.)
Derive challenges for each layer (call
Witness Generation #
TODO: Link to filproofs/algorithms
Layer Challenge Counts #
TODO: we should just list current parameters and show this as a calculation for correctness, this should not mandatory to implement.