Storacha and CIDs: Putting You in Control of Your Data
Discover how Storacha leverages CIDs to ensure data integrity, security, and sovereignty in a decentralized storage future.
Since the Internet was born, we’ve built systems around trusted relationships with third parties. On the early web I’d upload my HTML document (ie, a webpage) to GeoCities’ server and then point my URL at GeoCities and then tell my friends about the URL. Their web browser would resolve the URL to GeoCities’ servers and get the document I uploaded, and we’d both trust that GeoCities would send them the webpage I uploaded. Additionally, I’d trust that GeoCities would stay online so my friend could download my webpage. This is more-or-less how things continue to work today — I send Facebook or Google or Squarespace some data, they interpret that data and process it and eventually display it to, among others, people who I’d like to see it.
This works reasonably well for some types of data, but it has a few drawbacks:
1) If the service I trust to serve my data goes down, my data can’t be sent to my friends — I trust GeoCities to be online and if they aren’t I’m out of luck
2) If the service I trust to serve my data is compromised by an attacker, my friends may be sent data I didn’t intend them to receive — in the worst case, this could be malware or information that actively causes harm to my friends
The first problem can be solved by giving my friends a backup address for the data, but that pushes the burden of finding those replicas onto them and requires additional trust in yet another third party. The second problem is thornier — I could give them a cryptographic fingerprint of my data and ask them to verify the data before they look at it, but that’s annoying unless it’s built in to the browser I use to download data from the web, and it would require me to publish that fingerprint to anyone who wanted to download my data.
Maybe there’s a better way?
CIDs To The Rescue!
When we say that Storacha lets YOU control YOUR data this is part of what we mean — we don’t make you trust that we’ll serve the data you uploaded, we build special cryptographic fingerprints of your data into the heart of our data storage system. Our fingerprints are called “Content Identifiers” (CIDs for short) and you’ll see them all over the place in Storacha. Here’s one:
bafybeibj4n2gszrg6qbd4yu4m2sd6uzd3blbqhi6k3ikpm6obrda3qjhlmI got this one when I uploaded a picture of my cats using the w3 Command Line Interface (CLI):
I can access it via the Storacha gateway, using the link returned by w3 up:
https://w3s.link/ipfs/bafybeibj4n2gszrg6qbd4yu4m2sd6uzd3blbqhi6k3ikpm6obrda3qjhlm
Or I can access it via other gateways operated by the IPFS community:
https://dweb.link/ipfs/bafybeibj4n2gszrg6qbd4yu4m2sd6uzd3blbqhi6k3ikpm6obrda3qjhlm
https://ipfs.io/ipfs/bafybeibj4n2gszrg6qbd4yu4m2sd6uzd3blbqhi6k3ikpm6obrda3qjhlm
It’s important to note that the CID was generated locally by the CLI running on my laptop. My laptop told Storacha’s storage service about the CID and the size of my upload, and Storacha used that information to generate a signed upload link directly to the backend where my data ended up being stored — the Storacha upload service never actually saw the bytes of my upload, it just used the CID my laptop generated to facilitate the upload to its actual at-rest home. Right now, we provide the at-rest home for your data. But with the launch of Storacha’s Alpha Network, we’re paving the way for a truly decentralized future — where your data won’t need to touch our servers at all!
CIDs have lots of interesting properties. They are a special type of “self-describing format” which means that a CID contains information about itself — if someone finds a CID and knows how multiformats work they can determine the content type (eg, JSON, Protobufs, etc) and get the cryptographic fingerprint of the data the CID identifies. You can learn more about CIDs in the multiformats/cid GitHub repository.
In future months we’ll take a deep dive down the rabbit hole of self-description — it’s a powerful technique that helps ensure the long-term viability of data and future-proofing of the systems that generate it.
CIDs mean that I can be totally certain that the data my friends download from Storacha is the data I uploaded because the proof is baked into the URL I give them to download it — anyone can download the data and compare it to the cryptographic fingerprint embedded in the CID, no matter how many intermediaries the data passed through on its way to their machine. We’ve built Storacha on this powerful foundation, the result of years of research and development by our dear friends at Protocol Labs, and we’re excited to help you realize the full potential of content addressing, cryptographic hashing and self-describing formats! Please join us on Discord if you’d like to learn more.