Decentralized Storage FAQ
Some answers to common questions about DSA and Decentralized Storage
Please feel free to reproduce, copy, and re-use the content of this FAQ
What Is Decentralized Storage...
-
...In fewer than 20 words?
Decentralized storage enables data preservation across independent providers without a central controlling authority, cryptographically preserving data ownership and provenance.
-
...In other people’s words?Some other definitions:
-
What does DStorage do better?
There are two major benefits Decentralized Storage offers over most other storage technologies:
Content Addressability, and Verifiable Data Integrity.
Additional benefits include eliminating single points of failure, and censorship resistance, pricing structures aligned to usage, and playing with shiny new technology.
-
When isn't DStorage better?
Currently, for certain use cases where storage or retrieval events are frequent, decentralized storage can be more expensive than traditional cloud storage or self-hosted storage.
Decentralized Storage systems do not necessarily provide provable data confidentiality. Individual Storage Providers can make contractual agreements to ensure Data Confidentiality, but just as with more traditional storage systems, there isn’t necessarily a verification mechanism that automatically proves whether data has leaked beyond the agreed confidentiality bounds.
-
When is DStorage better?
Decentralized Storage today is particularly powerful in meeting a certain set of storage requirements:
- Publicly available data
- Long-term storage
- Infrequent access
- Main Data users are not “Data Owner"
- High-speed Retrieval is not critical
The more of these criteria apply to a use case, the more likely Decentralized Storage is to be the right answer.
-
How much does it cost
The cost of Decentralized Storage depends on factors including the usage, and on the amount of decentralization desired.
Storage, and retrieval, range from being free (but with limited service and service guarantees) to costing several USD per TB stored for a month, or retrieved on demand.
Where it is used for archival storage, or for content that is infrequently accessed, it is likely to be cheaper than other storage options, with current costs ranging from free to
In principle, content retrieval is separated from content storage, and thus it is possible to pay for it separately, including allowing the public to retrieve content at their own cost, and only paying to store it.
It is important to note that to provide reliability, Decentralized Storage expects data owners to store multiple copies of their data. Whether this is managed by the data owner directly for true decentralization, or a contracted third party or “Layer 2”, multiplying the cost of storage by an order of magnitude can still offer a significantly lower storage cost than with traditional solutions.
-
What are some DStorage Networks?
A selection of Decentralized Storage systems:
- Aleph.im
- Arweave
- BitTorrent File System
- Crust
- Filecoin
- InterNXT
- Kyve Network
- MaidSafe
- Opacity
- Swarm
- Züs (formerly known as 0Chain)
-
… In fewer than 200 words?
Decentralized storage enables data preservation across independent providers without a central controlling authority, cryptographically preserving data ownership and provenance. Networks are typically arranged in a peer-to-peer structure operating via autonomous contracts, with programmatic incentive structures to create an independent market for data storage that resists censorship and centralized control.
Key features of decentralized storage include:
- Content addressability: Data retrieval based on its verified content, not location.
- Data integrity: Ability to cryptographically verify data integrity.
- Improved reliability: Reduced vulnerability from eliminating single points of failure.
- Enhanced security: More difficult to compromise distributed copies.
- Censorship Resistance: No single entity controls all information.
- Better scalability: Systems can horizontally scale by adding additional nodes.
- Cost-efficiency: Better utilization of unused storage capacity.
These differences over traditional storage networks are especially promising for data networks, cooperatives, and consortiums where sharing data across multiple entities requires secure, dynamic, and cost-effective ways to manage storage and access.
What is the difference between
-
Decentralized, and Distributed Storage
Decentralized Storage is when control of data storage is at the discretion of the Data Owner, rather than a third party; Distributed Storage is when data is stored in multiple physical locations.
It is sensible for data held in Decentralized Storage to use Distributed Storage as part of ensuring it is reliably available.
-
Decentralized Storage, IPFS and Filecoin
IPFS is a basic open protocol designed to implement Decentralized Storage.
Filecoin builds on top of IPFS, with a blockchain and a token (FIL) that provides incentives to implement the various features such as providing storage or performing retrieval. Filecoin also sets some specific tasks for storage provideers, to demonstrate that they are fulfilling the obligations they have taken on in Deals.
What is...
-
…Content Addressability?
Content Addressability is using the content stored (more practically, a hash of it) to identify the content you want to retrieve, instead of using information about where it is physically stored, such as more traditional http: and ftp: URLs.
Where there are multiple copies of the same content available, which of the is retrieved can depend on various factors such as the cost, or speed of retrieval.
-
...Verifiable Data Integrity?
The ability to prove that the data you receive is exactly the data you asked for.
To enable Content Addressability, a hash is generated of the content that is stored. Because changing a single bit of the data would produce a different value for the hash, this can be used to demonstrate that data retrieved is exactly the data that was stored; If there has been any tampering with or loss of data it is immediately evident.
In IPFS what is...
-
Retrieval
Fetching data
-
Decentralized Storage, IPFS and Filecoin
IPFS is a basic open protocol designed to implement Decentralized Storage.
Filecoin builds on top of IPFS, with a blockchain and a token (FIL) that provides incentives to implement the various features such as providing storage or performing retrieval. Filecoin also sets some specific tasks for storage provideers, to demonstrate that they are fulfilling the obligations they have taken on in Deals.
-
A CID, a Sector and a CAR?
A CID, or Content IDentifier, is a sequence of characters that is generated for some Content. It includes a hash of the content for data integrity verification, and some metadata such as the content type. A CID is used to identify content for retrieval, and to prove that the content retrieved is what was originally stored.
A Sector is a unit of stored data.
A CAR or Content Archive is data prepared in a specific format for long-term storage, similar to “tar files” that were developed decades ago for magentic tape storage. A collection of files can be compressed into a CAR or extracted from one.
In Filecoin, what is...
-
A Deal?
An agreement between a data owner and a storage provider, that the storage provider will hold a copy of some amount of data.
-
Sealing?
Sealing is a process of taking a data sector, and performing an intense computation over it that identifies a particular unique copy of that data.
It can be reversed, in the process of Unsealing.
-
PoST and PoRep?
In the Filecoin Decentralized Storage system, Storage Providers are challenged to provide evidence that they are meeting their agreed obligations, in exchange for rewards.
PoRep, or Proof of Replication, is a mathematical proof that a particular copy of data has been stored by the Storage Provider. The proof is based on Sealing the data - a computation-intense process that identifies a unique copy.
PoST, or Proof of Space/Time, is a demonstration that the Storage Provider is able to answer questions automatically generated about random parts of a sealed copy of the data the Storage Provider claims to hold, fast enough that it is infeasible to retrieve another copy instead of holding the data in storage.
Taken together, these two proofs are used to show there is an increasingly high probability (after a few days, the probability is "multiple-9s") that the Storage Provider stored, and is maintaining the data they claim to as part of a Deal.