FIP-0019 Source

TitleSnap Deals
Author@Kubuxu, @lucaniz, @nicola, @rosariogennaro, @irenegia
Discussions-Tohttps://github.com/filecoin-project/FIPs/issues/145
StatusFinal
TypeCore
Created2021-08-24

Spec Sections

FIP-0019: Snap Deals

Simple Summary

A one-message protocol for updating any sector with new data without re-sealing.

Abstract

Storage provider embeds deal data into existing sector replica with some unpredictable randomness by “xor”-ing.

The miner generates one message which proves that sector was updated with new data.

Change Motivation

Since 90+% of sectors in the Filecoin Network are CC sectors, having a protocol that allows for updating CC sectors to store real data without incurring in a full re-sealing would massively improve our network in terms of the amount of real data stored through it.

  1. It would allow decoupling sealing latency from deal-making speed - offering storage clients an improved experience for how quickly their data can land in on-chain deals
  2. It would unlock the 8EiB of storage already committed to the Filecoin network to be quickly used for deals - enabling a 100PiB+ client to make deals for their entire dataset with a single miner like f0127595 which already has 120PiB of committed capacity.
  3. It makes utilizing existing committed capacity much cheaper for miners (since they’ve already invested the sealing cost), increasing the chances they pay clients to add FIL+ data to these sectors.

What this FIP does not support (see Future Work section):

  • Deal updates or updating sectors with existing deals
  • Moving deals across sectors

Specification

Nomenclature

SectorKeyCID - on-chain commitment of CC sector replica

UnsealedSectorCID - commitment of deal data generated from PieceCIDs

SealedSectorCID - commitment of sector replica with deals embedded in it

EncodingRands - collection of HashShards randomness values used for encoding process

Poseidon-128 - SNARK friendly hash function

Parameters

ChallengesNumber = 1376 - number of challenges to be proven

HashShards = 512 - number of hashes used for randomness in encoding process

One-messages Update Protocol

  1. Storage Provider accepts and publishes storage deals (using PublishStorageDeals)
  2. If necessary storage provider extends sector expiration so that sector is active long enough to contain deals with miner.ExtendSectorExpiration
  3. Storage Provider updates an existing replica with deals:
    1. Pre-generate HashShards possible number of EncodingRand(ComputeUnsealedCID(P1, P2, P3, ...), i) where P1, P2, P3, ... are pieces corresponding to deals that will included in this sector.
    2. Encode the deal data into existing replica using newReplica[i] = Enc(sectorKey[i], data[i], EncodingRand(o)) function:
      • newReplica is the vector of Fr elements of the new replica
      • sectorKey is the vector of Fr elements of the CC sector replica data
      • data is the vector of Fr elements of the deals data ordered and padded as done in the sealing subroutine
      • NodeSectorSize is size of the sector in nodes (32 byte chunks)
    3. Compute SealedSectorCID of the newReplica.
  4. Storage Provider produces a proof of sector update. Generate a SNARK that proves:
    1. Generation of ChallengesNumber challenges in batches of 8:
      1. For j=0..ChallengesNumber//8, [_, c[8*j+7], c[8*j+6], ..., , c[8*j] = Split-31bit(Poseidon-128(Poseidon-128(UnsealedSectorCID, SealedSectorCID), j). The Split-31bit splits the 255bit output of Poseidon-128 hash into 8 31 bit chunks and discards 7 high bits.
      2. c[i] = c[i] mod NodeSectorSize
    2. For each challenge i=0..ChallengesNumber, c = c[i]
      1. Encoding: the following we correcty computed: newReplica[c] = Enc(sectorKey[c], data[c], Poseidon-128(UnsealedSectorCID, c*HashNumber//NodeSectorSize)
      2. Inclusion proofs:
        1. newReplica[c] is the opening of UpdatingSectorCID at position c
        2. sectorKey[c] is the opening of CommRLast from SealedSectorCID at position c
        3. data[c] is the opening of UnsealedSectorCID at position c
  5. Storage Provider publishes Miner.ReplicaUpdate(SectorID, NewSealedSectorCID, deals []DealID, Proof): 1. Validate input parameters
    1. Activate Deals
    2. Generate NewUnsealedSectorCID (CommD) based on DealIDs and market actor state
    3. Verify the proof of correct data update:
      • Verify that proof is valid for using inputs: NewSealedSectorCID, NewUnsealedSectorCID, SectorKey.
    4. Update SectorInfo

Algorithms

Encoding Randomness

EncodingRand(UnsealedSectorCID, i) = Poseidon-128(UnsealedSectorCID, i*HashShards//NodeSectorSize)

Intuitively the sector nodes (32byte chunks) are split into HashShards number of continuous segments with the same encoding randomness.

Encoding

The current encoding algorithm is the following:

Enc(sectorKey[i], data[i]) = sectorKey[i] + data[i] * EncodingRand(i)

The Encoding function is cheap and allows for parallel encoding.

Decoding

Dec(sectorKey[i], replica[i]) = (replica[i] - sectorKey[i]) * (EncodingRand(i)<sup>-1</sup>)

The modular inverse of EncodingRand can be computed only HashShards times per whole operation.

Retrieving updated data from encoded replica

  1. From the replica:
    1. Regenerate sectorKey by re-sealing the sector
    2. For each i: Dec(sectorKey[i], replica[i], EncodingRand)

Deprecation of current CC Upgrade

PreCommitSector will no longer be allowed to schedule current cc upgrades during prove commit. The data on chain tracking cc upgrades and the miner actor logic doing current cc upgrades will be removed. PreCommit message parameters will no longer include fields for specifying cc upgrades.

ProveReplicaUpdates Actor Method

Params

type SnapDealsUpdate struct {
     SectorID SectorID
     Deadline int
     Partition int
     Deals []DealID
     Proof ReplicaUpdateProofs
}

Miner.ProveReplicaUpdates(Updates []SnapDealsUpdate)

Spec

Unless stated assume that each step operates on all inputs updates in a loop

  1. Param validation
    1. Assert that SectorID has no deals
    2. Assert proof size is not above limits
    3. Assert batch size bounds are not above limits
    4. Assert that sector’s power is active. Fail if sector is in unproven, faulty or terminated states
  2. Market Activation and CommD generation
    1. Call market.ActivateDeals on all deals in the update
    2. Call market.ComputeDataCommittment with batched inputs to generate all CommDs for all updates
  3. Proof verification
    1. Extend runtime with VerifyReplicaUpdate method
  4. Update Miner State
    1. SectorNumber i.e. the sector ID, Expiration and SealProof fields remain unchanged
    2. SealedCID, DealIDs, DealWeight, VerifiedDealWeight are all modified to match the upgraded sector with deals
    3. InitialPledge is the max of CC sector’s InitialPledge and the value computed with the upgrade power at the upgrade epoch
    4. ReplacedDayReward is assigned the value of ExpectedDayReward and ExpectedDayReward is recalculated, as in current CC upgrade
    5. ReplacedSectorAge is set to upgradeEpoch - Activation as in current CC upgrade
    6. ExpectedStoragePledge is recalculated
    7. If SectorKeyCID is null, set it to SealedCID of the CC sector being replaced
    8. Activation is set to current epoch
    9. Call ReplaceSectors on all partitions with updates
  5. Update Power
    1. partition.ReplaceSectors will update miner deadline state to correct power
    2. Call power.UpdateClaimedPower to update power actor state.

Batching and Error Handling

This method accepts a batch of updates to be processed in one message call. As much as possible error handling should ignore updates that do not pass validation and only fail if all updates fail validation or if the entire operation fails.

Upgrade and State Migration

  1. Migration removing precommit info about cc upgrades in the PreCommit map
  2. PreCommit params will lose the fours cc upgrade fields
  3. Add SectorKey a new field to the sector info. It should be a pointer to a cid serializing to cbor null when no snap deals upgrade has occurred and becoming the original SealedCID after the first snap deals update

Design Rationale

Encoding Function

The aim of the encoding function is to allow encoding data into an existing sector key without breaking the compressibility assumption of PoRep. …

Parameters choice

  • ChallengeNumber: 1376 - to facilitate required security factor with Fiat-Shamir (TODO expand in Security Considerations).
  • HashShards: 512 - to reach required spacegap fraction without neadlessly increasing the cost of conducting protocol (TODO explanation).

Limitations

The update protocol is limited to CC sectors as the SectorKey commitment is only avaliable for sectors with no data. The SectorKey commitment is currently included as CommRLast commitment inside the on-chain SealedSectorCID when the sector is CC. When the sector contains deals natively the SealedSectorCID includes only CommRLast+Data commitment.

Backwards Compatibility

This change is not backwards compatible and will require a new actors version and a network upgrade.

Test Cases

TODO

Security Considerations

Loosing epsilon-Replication

This section is a summary of more detailed report.

Assumption 1: We work under the assumption that for each bit grinded on the {\rhoi}i=1,...,N an adversary can flip one bit of her choice on the original R.

We consider \lambda = 80 as the security parameter. This means that we give the adversary a bit flip budget of 80 bits on R (by crafting R* such that these bits are flipped, keeping all the rest untouched) in order to make R* more compressible than R.

Observation: As we will see in what follows, in the case of our actual encoding (where a single \rhoi is shared among K distinct nodes Dj), we’ll be either more conservative than we should, according to Kolmogorov bounds.

In order to stay on the safe side, we have two independent security analysis, following two different paths that give us similar results.

The first analysis is a probabilistic one, and we refer to it as the “security against the flipping adversary”. The second one is using an information theoretical argument based on Kolmogorov complexity.

Analysis 1: Security against “2 bytes for 1 bit” Adversary

Preliminaries: Given a predetermined string of L bits, we take for granted that a random string has probability 2-L to contain it. We will consider bit strings of K_consecutive 0s (or 1s). Note that it is the same when considering patterns like 01 or 10, the only difference is that those sequences would allow for less compression. Assumption 2: An adversary can compress ~2 consecutive bytes (we would consider 2 bytes per node, but it can happen anywhere) by flipping a single bit. This means having substrings s that are of the form s = 0000000100000000 (or its negation). Note that the probability of s (or negation of s) to happen is 2-15. Since the adversary can adaptively choose the bits to flip, we are just assuming he finds 80 of these strings in a 32GiB string. 2 bytes out of 32 represent the 6.25% of a single node

Note 1: Remember that each randomness \rhoi is shared among K nodes, have 230/K different hashes. As a consequence, we consider each compression performed on a node D to propagate on all the K nodes that share the same randomness with D.

Assumption 3: An adversary A is able to find a string of the form of s = 000000010000000 (or equivalent) in 80 different nodes, each of those uses a different \rhoi

Note 2: Assumption 3 maximizes the propagation of the compression made by the adversary since all the local compression propagates to K nodes each. This results in a total compression of 800.0625K nodes out of 230

With H = 29 and K = 221 we obtain a total compression under 1%.

Assume we have 29 different hashes. This means that we have a total compression, according to Assumption 2 and Assumption 3 of ~1%.

Analysis 2: Based on Kolmogorov Complexity (Information Theoretic Argument)

Information Theoretic Argument: Given a string S of size N with incompressibility V, for any adversary that modifies S into S' by flipping T bits of his choice, we have that the incompressibility of S' is at least V - T*(log(N) + log(log(N))+1). Proof: Assume by contradiction that there exists a string S'' which is more compressible than S'. Then it means that it is possible to compress S into a string S* of length | S’’ | + T*(log(N) + 2*log(log(N))+1) < V .

Note: T*(log(N) + 2*log(log(N))+1) are the bits needed in order to be able to store the T bit positions where the adversary flipped the bits, using self delimiting codes (References: this paper, page 9; this book, Section 2.2).

Given the following: We assume R to be incompressible, and we then set the incompressibility of R to N. We assume an adversary can flip 80 bits of his choice in R. To be conservative, given that we are considering 230/K different hashes, we apply the compressibility lower bound on a vector of size N’ = 230/K nodes

According to the lower bound, we get that an adversary that can flip 80 bits of his choice on R can not compress R to a string that has compressibility smaller than N’ - 80(log(N’) + 2*log(log(N’)) + 1).

In order to be even more conservative, we are using N instead of N' in the compressibility reduction factor. We are then considering that R can not be compressed to a string which is less compressible than 230/K - 80(log(230) + log(log(230)) + 1) = 230/ K - 80(30+11) .

As a consequence, we have that an adversary can compress a total of 80(30+11)*K bits out of 230.

This means that, for K = 221 and 29 = 512 different hashes, we have that the additional spacegap is < 2.51%. Given in our case the distribution of the last bit of each node is not uniform, we also considered the case of nodes of 255 bits rather than 256. Spacegap in this case is < 2.5123%. If we allow for 1024 hashes, additional spacegap is bounded by < 1.26%. If as above we consider nodes of bit length 255 instead of 256 we still obtain that an additional spacegap bounded by < 1.26%

Poseidon-128

Poseidon-128 is zk-SNARK friendly hash function which is already widely used in Filecoin protocol. Both uses of Poseidon-128 include Domain Separation Tags such that the usage here does not clash with possible future use cases.

Domain Separation Tags used:

  • Encoding Randomness Generation: 2^40
  • Challenge Generation: 2^40

Domain Separate Tags for both are the same as their inputs are different.

Incentive Considerations

SnapDeals does not allow storage providers to release currently held collateral for the sector but it allows to reuse that already held collateral if collateral required is greater, for example due to verified deal.

Product Considerations

Snap Deals protocol significantly reduces the time needed for data to be included in a sector and confirmed on-chian. The process was designed with cost for storage providers in mind.

Implementation

Future work

  • DeclareDeals to support deal transfer: allow moving deals from sector to sector
  • CapacityDeals: allow purchasing a capacity of the storage of a storage provider instead of the storage of a specific deal
  • Update protocol that does not require to perform an operation on a full sector.
  • DealUpdates: a license to terminate a deal in place for a new one (e.g. a user wants to update a portion of their files)

Copyright and related rights waived via CC0.

Citation

Please cite this document as:

@Kubuxu, @lucaniz, @nicola, @rosariogennaro, @irenegia, "FIP-0019: Snap Deals," Filecoin Improvement Proposals, no. 0019, August 2021. [Online serial]. Available: https://fips.filecoin.io/fips/fip-0019.