Data Transfer in Filecoin #


Data Transfer is a system for transferring all or part of a Piece across the network when a deal is made.

Modules #

This diagram shows how Data Transfer and its modules fit into the picture with the Storage and Retrieval Markets. In particular, note how the Data Transfer Request Validators from the markets are plugged into the Data Transfer module, but their code belongs in the Markets system.

Data Transfer - Push Flow
Data Transfer - Push Flow

Terminology #

  • Push Request: A request to send data to the other party
  • Pull Request: A request to have the other party send data
  • Requestor: The party that initiates the data transfer request (whether Push or Pull)
  • Responder: The party that receives the data transfer request
  • Data Transfer Voucher: A wrapper around storage or retrieval data that can identify and validate the transfer request to the other party
  • Request Validator: The data transfer module only initiates a transfer when the responder can validate that the request is tied directly to either an existing storage deal or retrieval deal. Validation is not performed by the data transfer module itself. Instead, a request validator inspects the data transfer voucher to determine whether to respond to the request.
  • Scheduler: Once a request is negotiated and validated, actual transfer is managed by a scheduler on both sides. The scheduler is part of the data transfer module but is isolated from the negotiation process. It has access to an underlying verifiable transport protocol and uses it to send data and track progress.
  • Subscriber: An external component that monitors progress of a data transfer by subscribing to data transfer events, such as progress or completion.
  • GraphSync: The default underlying transfer protocol used by the Scheduler. The full graphsync specification can be found here

Request Phases #

There are two basic phases to any data transfer:

  1. Negotiation - the requestor and responder agree to the transfer by validating with the data transfer voucher
  2. Transfer - Once the negotiation phase is complete, the data is actually transferred. The default protocol used to do the transfer is Graphsync.

Note that the Negotiation and Transfer stages can occur in separate round trips, or potentially the same round trip, where the requesting party implicitly agrees by sending the request, and the responding party can agree and immediately send or receive data.

Example Flows #

Push Flow #

RequestorData Transfer Module(Requestor)Scheduler(Requestor)Graphsync(Requestor)Graphsync(Responder)Scheduler(Responder)Data Transfer Module(Responder)ResponderOne system. Likely A ClientOne system. Likely A MinerInitiate PushSchedule TransferSend Data Transfer RequestValidate Push RequestPush Request validatedSchedule TransferMake Graphsync RequestSend Graphsync RequestVerify Transfer ScheduledRequest is scheduledSend ResponseResponse Progress (to end)Request CompleteRequest Completed (if listening)RequestorData Transfer Module(Requestor)Scheduler(Requestor)Graphsync(Requestor)Graphsync(Responder)Scheduler(Responder)Data Transfer Module(Responder)Responder
Data Transfer - Push Flow #
  1. A requestor initiates a Push transfer when it wants to send data to another party.
  2. The requestors’ data transfer module will send a push request to the responder along with the data transfer voucher. It also puts the data transfer in the scheduler queue, meaning it expects the responder to initiate a transfer once the request is verified
  3. The responder’s data transfer module validates the data transfer request via the Validator provided as a dependency by the responder
  4. The responder’s data transfer module schedules the transfer
  5. The responder makes a GraphSync request for the data
  6. The requestor receives the graphsync request, verifies it’s in the scheduler and begins sending data
  7. The responder receives data and can produce an indication of progress
  8. The responder completes receiving data, and notifies any listeners

The push flow is ideal for storage deals, where the client initiates the push once it verifies the deal is signed and on chain

Pull Flow #

RequestorData Transfer Module(Requestor)Scheduler(Requestor)Graphsync(Requestor)Graphsync(Responder)Scheduler(Responder)Data Transfer Module(Responder)ResponderOne System. Likely A ClientOne System. Likely A MinerInitiate PullSend Data Transfer RequestValidate Pull RequestPull Request validatedSchedule TransferSend Data Transfer Request AcceptedSchedule TransferMake Graphsync RequestSend Graphsync RequestVerify Transfer ScheduledRequest is scheduledSend ResponseResponse Progress (to end)Request CompleteRequest Completed (if listening)RequestorData Transfer Module(Requestor)Scheduler(Requestor)Graphsync(Requestor)Graphsync(Responder)Scheduler(Responder)Data Transfer Module(Responder)Responder
Data Transfer - Pull Flow #
  1. A requestor initiates a Pull transfer when it wants to receive data from another party.
  2. The requestors’ data transfer module will send a pull request to the responder along with the data transfer voucher.
  3. The responder’s data transfer module validates the data transfer request via a PullValidator provided as a dependency by the responder
  4. The responder’s data transfer module schedules the transfer (meaning it is expecting the requestor to initiate the actual transfer)
  5. The responder’s data transfer module sends a response to the requestor saying it has accepted the transfer and is waiting for the requestor to initiate the transfer
  6. The requestor schedules the data transfer
  7. The requestor makes a GraphSync request for the data
  8. The responder receives the graphsync request, verifies it’s in the scheduler and begins sending data
  9. The requestor receives data and can produce an indication of progress
  10. The requestor completes receiving data, and notifies any listeners

The pull flow is ideal for retrieval deals, where the client initiates the pull when the deal is agreed upon.

Alternater Pull Flow - Single Round Trip #

RequestorData Transfer Module(Requestor)Scheduler(Requestor)Graphsync(Requestor)Graphsync(Responder)Scheduler(Responder)Data Transfer Module(Responder)ResponderOne System. Likely A ClientOne System. Likely A MinerInitiate PullSchedule TransferMake Graphsync Request w/ Data Transfer Request Piggy BackedSend Graphsync Request (w/ Data Transfer Request)Verify Request (validate & schedule)Validate Pull RequestPull Request validatedSchedule TransferSend response w/ DTR Accepted Piggy BackedSend response w/ DTR Accepted Piggy BackedResponse Progress (to end) (include DT Accepted)Request CompleteRequest Completed (if listening)RequestorData Transfer Module(Requestor)Scheduler(Requestor)Graphsync(Requestor)Graphsync(Responder)Scheduler(Responder)Data Transfer Module(Responder)Responder
Data Transfer - Single Round Trip Pull Flow #
  1. A requestor initiates a Pull transfer when it wants to receive data from another party.
  2. The requestor’s DTM schedules the data transfer
  3. The requestor makes a Graphsync request to the responder with a data transfer request
  4. The responder receives the graphsync request, and forwards the data transfer request to the data transfer module
  5. The requestors’ data transfer module will send a pull request to the responder along with the data transfer voucher.
  6. The responder’s data transfer module validates the data transfer request via a PullValidator provided as a dependency by the responder
  7. The responder’s data transfer module schedules the transfer
  8. The responder sends a graphsync response along with a data transfer accepted response piggybacked
  9. The requestor receives data and can produce an indication of progress
  10. The requestor completes receiving data, and notifies any listeners

Protocol #

A data transfer CAN be negotiated over the network via the Data Transfer Protocol, a libp2p protocol type

A Pull request expects a response. The requestor does not initiate the transfer until they know the request is accepted.

The responder should send a response to a push request as well so the requestor can release the resources (if not accepted). However, if the Responder accepts the request they can immediately initiate the transfer.

Using the Data Transfer Protocol as an independent libp2p communciation mechanism is not a hard requirement – as long as both parties have an implementation of the Data Transfer Subsystem that can talk to the other, any transport mechanism (including offline mechanisms) is acceptable.

Data Structures #

import ipld "github.com/filecoin-project/specs/libraries/ipld"
import libp2p "github.com/filecoin-project/specs/libraries/libp2p"
import cid "github.com/ipfs/go-cid"
import piece "github.com/filecoin-project/specs/systems/filecoin_files/piece"
import peer "github.com/libp2p/go-libp2p-core/peer"

type StorageDeal struct {}
type RetrievalDeal struct {}

// A DataTransferVoucher is used to validate
// a data transfer request against the underlying storage or retrieval deal
// that precipitated it
type DataTransferVoucher union {
    StorageDealVoucher
    RetrievalDealVoucher
}

type StorageDealVoucher struct {
    deal StorageDeal
}

type RetrievalDealVoucher struct {
    deal RetrievalDeal
}

type Ongoing struct {}
type Paused struct {}
type Completed struct {}
type Failed struct {}
type ChannelNotFoundError struct {}

type DataTransferStatus union {
    Ongoing
    Paused
    Completed
    Failed
    ChannelNotFoundError
}

type TransferID UInt

type ChannelID struct {
    to peer.ID
    id TransferID
}

// All immutable data for a channel
type DataTransferChannel struct {
    // an identifier for this channel shared by request and responder, set by requestor through protocol
    transferID  TransferID
    // base CID for the piece being transferred
    PieceRef    cid.Cid
    // portion of Piece to return, specified by an IPLD selector
    Selector    ipld.Selector
    // used to verify this channel
    voucher     DataTransferVoucher
    // the party that is sending the data (not who initiated the request)
    sender      peer.ID
    // the party that is receiving the data (not who initiated the request)
    recipient   peer.ID
    // expected amount of data to be transferred
    totalSize   UVarint
}

// DataTransferState is immutable channel data plus mutable state
type DataTransferState struct @(mutable) {
    DataTransferChannel
    // total bytes sent from this node (0 if receiver)
    sent                 UVarint
    // total bytes received by this node (0 if sender)
    received             UVarint
}

type Open struct {
    Initiator peer.ID
}

type SendData struct {
    BytesToSend UInt
}

type Progress struct {
    BytesSent UInt
}

type Pause struct {
    Initiator peer.ID
}

type Error struct {
    ErrorMsg string
}

type Complete struct {}

type DataTransferEvent union {
    Open
    SendData
    Progress
    Pause
    Error
    Complete
}

type DataTransferSubscriber struct {
    OnEvent(event DataTransferEvent, channelState DataTransferState)
}

// RequestValidator is an interface implemented by the client of the data transfer module to validate requests
type RequestValidator struct {
    ValidatePush(
        sender    peer.ID
        voucher   DataTransferVoucher
        PieceRef  cid.Cid
        Selector  ipld.Selector
    )
    ValidatePull(
        receiver  peer.ID
        voucher   DataTransferVoucher
        PieceRef  cid.Cid
        Selector  ipld.Selector
    )
    ValidateIntermediate(
        otherPeer  peer.ID
        voucher    DataTransferVoucher
        PieceRef   cid.Cid
        Selector   ipld.Selector
    )
}

type DataTransferSubsystem struct @(mutable) {
    host              libp2p.Node
    dataTransfers     {ChannelID: DataTransferState}
    requestValidator  RequestValidator
    pieceStore        piece.PieceStore

    // open a data transfer that will send data to the recipient peer and
    // open a data transfer that will send data to the recipient peer and
    // transfer parts of the piece that match the selector
    OpenPushDataChannel(
        to        peer.ID
        voucher   DataTransferVoucher
        PieceRef  cid.Cid
        Selector  ipld.Selector
    ) ChannelID

    // open a data transfer that will request data from the sending peer and
    // transfer parts of the piece that match the selector
    OpenPullDataChannel(
        to        peer.ID
        voucher   DataTransferVoucher
        PieceRef  cid.Cid
        Selector  ipld.Selector
    ) ChannelID

    // close an open channel (effectively a cancel)
    CloseDataTransferChannel(x ChannelID)

    // get status of a transfer
    TransferChannelStatus(x ChannelID) DataTransferStatus

    // pause an ongoing channel
    PauseChannel(x ChannelID)

    // resume an ongoing channel
    ResumeChannel(x ChannelID)

    // send an additional voucher for an in progress request
    SendIntermediateVoucher(x ChannelID, voucher DataTransferVoucher)

    // get notified when certain types of events happen
    SubscribeToEvents(subscriber DataTransferSubscriber)

    // get all in progress transfers
    InProgressChannels() {ChannelID: DataTransferState}
}