What is data availability, and why do we need it?

Upnode

Oct 14, 2023 — 5 min read

What is data availability?

Data availability in the blockchain world refers to the ability to fetch the raw data related to a particular encoded header at any time. For example, if a rollup only stores hashes of transactions, we can’t query the underlying transactions calculated into hashes. Inability to query means we don’t have data availability in this case. But if we store any transactions associated with each hash, we can query those transactions at any time. We have data availability in this case since we can query them.

The similarity between data availability and the QC process

We can explain data availability in the blockchain world by comparing it with the QC process.

Verified products are put into a box with verified sticker

Begin with the manufacturer. Once the manufacturer produces products, each product will be QC. Then pack multiple products into a package and put a sticker as verified proof. Packing is the rollup process of an optimistic rollup.

Customers may not unbox to check each product

The manufacturer will send these boxes to the customers. Customers may check whether all packages have verified stickers on the package or not. Similarly, in the optimistic rollup, Merkle root can compare to the verified sticker submitted into the Mainnet to verify every transaction in the rollup. Note that we can’t use Merkle root to reverse underlying lookup transactions.

In the QC process, if you want to check whether the products in the box are valid, you open the box and check. The physical world always has data availability, as we can see the content inside.

In the blockchain, the box may not be able to open. If we can’t open the box to see content inside, we can say that it lacks data availability.

In an optimistic rollup, the manufacturer isn’t responsible for verifying each product (transaction). The manufacturer packs it into a box (a rollup). And sometimes, challengers can unbox and submit fraud proof if they find a malicious product (transaction). The unboxing process requires data availability.

Why not store all transaction data to get data availability?

https://www.flickr.com/photos/adactio/46349424774

Storing all transaction data is using too many resources for each node and sacrificing scalability. Traditional monolithic blockchain such as Bitcoin and Ethereum requires miners and full-node operators to download the entire blockchain. It takes up too much disk space and requires tremendous RAM. Only professional node operators can operate such a powerful node. This is a threat to decentralization as it limits the opportunity to participate in a network. Having to care about data availability causes blockchain development to be much more complex. Therefore, in the future, we are shifting to modular blockchain design, where developers don’t have to care about data availability while developing a new blockchain.

How not to store all transaction data?

This topic is still in heavy research. Celestia is the first to release a solution called “Data availability sampling.” This is done by sampling small random chunks of the block data over multiple rounds, allowing nodes to verify data availability with statistical certainty.

More detail at https://celestia.org/glossary/data-availability-sampling/

Data availability and the modular blockchain.

https://celestia.org/static/54fa3c4c5754c057c4fbcd751d15c21e/c8502/article-1-image-2.png

Modular blockchain is a new concept of blockchain development that separates a blockchain into different layers in contrast with the classical monolithic blockchain, such as Ethereum, which contains all layers. Each layer can be reused and composed to create a new blockchain.

Layers of modular blockchains can be designed in different ways. The design in the Celestia document is the most common one. A blockchain comprises 3–4 layers, including Execution, Settlement (optional), Consensus, and Data availability layers. Some designs may have Application layers to let developers develop dApps on the blockchain.

The application layer is a layer that allows external developers to develop their dApps on top of the blockchain. The smart contract is an example of an application layer.
The execution layer is a layer that processes the transactions. This layer is to be customized by each dApps.
The settlement layer is an optional layer required if you are building a layer two blockchain. This layer is responsible for relaying messages between security and performance-focused chains. The most common message is the token bridge between these two chains.
The consensus layer is a layer that manages the consensus. If the chain uses proof of stake consensus, this layer enables users to stake the required token to help validate the transactions. The upper layer can share the consensus layer to help mitigate the work put into designing the consensus.
Data availability layer is a layer that acts as storage for the blockchain. This layer maintains the availability of data to allow fetching any part of data at any time.

Data availability is the bottom layer of a modular blockchain design. In Celestia, this layer is combined with the consensus layer. On the other hand, the execution and settlement layer is left to the work of developers to develop.

An example use case of data availability layer

The actual use case currently requires the combination of the Consensus layer and Data availability layer. Two examples are written in the Celestia document: https://docs.celestia.org/category/optimint.

A developer can build an application-focused blockchain with separate data availability and consensus layer. The most basic example is a Wordle blockchain. The developer is only responsible for developing the execution and settlement layer. While passing consensus and data availability work for Celestia. Consensus is painful for developers since developers must also apply business development to build consensus. Having data availability passed to others, the developer can use a regular computer such as Macbook Air M1 to run the blockchain.

Cosmos SDK modules help accelerate the development of the execution layer. The developer can clone Optimint from Celestia, which provides the settlement layer, and develop the execution layer using a custom module.

Conclusion

Data availability in the blockchain world refers to the ability to fetch the raw data related to a particular encoded header at any time. The QC process example lacks data availability if you can’t open the box to see the products inside it. Storing all transaction data is using too many resources for each node and sacrificing scalability. Celestia proposed a “Data availability sampling” solution to solve the data availability problem. Data availability is the bottom layer of modular blockchain layers often combined with the Consensus layer. Separating these layers allow developers to develop efficiently. The most basic example is a Wordle blockchain. The developer can also clone Optimint from Celestia, which provides the settlement layer, and develop the execution layer using a custom module.