Welcome and thank you for sitting down with us. You have an expansive vision of compute and storage. Can you explain this vision and what drives it?
Lukasz Magiera: My main goal is to make it possible for computers on the Internet to do their own thing and not have to go through permissioned systems if you don't need to. As a user I just want to interact with an infinitely scalable system where I put in money and I get to use its resources. Or I bring resources – meaning I host some computers – and money comes out.
The middle part of this should ideally be some kind of magical thing which handles everything more or less seamlessly. This is one of the main goals of digitalized systems like Filecoin and DePIN – to build this as a black box and depending on who has resources, you establish this really simple way to get and share resources that is much more efficient than having to rely on the old school cloud.
So you're looking not only at the democratization of compute and storage but also increasing the simplicity of using it?
Lukasz Magiera: There are a whole bunch of goals but all of them are around this kind of theme. There is the ability to easily bring in resources into the system to earn from those resources. Then there's the ability to easily tap into these resources.
And then there's more interesting capabilities – let's say I'm building some service or a game and it might have some light monetization features, but I really don't want to care about setting up any of the infrastructure. This is something that the cloud promised but didn't exactly deliver – meaning that scaling and using the cloud still requires you to hire a bunch of people and pay for the services.
The ideal is that you could build a system which is able to pay for itself. The person or the team building some service basically just builds the service and all of the underlying compute and storage is either paid for directly by the client or the service pays for its resources automatically via any fees while giving the rest of its earnings to the team.
It's essentially inventing this different model of building stuff on the Internet whereby you just build a thing, put it into this system, and it runs and pays for itself – or the client pays for it without you having to pay for the resources yourself. In this way, future models of computing will become much more efficient.
That’s certainly a grand vision. What does it take to get there?
Lukasz Magiera: There is a vision but it is, I would say, a distant one. This is not something we will get to at any moment. Filecoin itself took about eight years now to get to where it is and this was with intense development. I know we can get there but this is going to be a long path. I wouldn't expect to have the system I envision that's even approximately close to this in less than 10 years.
THE SOFTWARE BEHIND THE FILECOIN NETWORK
Let’s step back a bit into where we are now. You are the co-creator of Lotus, can you describe it to us?
Lukasz Magiera: Lotus is the blockchain node implementation for the Filecoin network. And so when you think of the Filecoin blockchain, you think mostly about Lotus. It doesn't implement any of the actual infrastructure services, but it does provide all of the consensus to all of the protocols that the storage providers provide.
The interesting thing is that clients [those who are submitting data to the network] don't really have to run the consensus aspect. They only have to follow and interact with the chain – and they would only do this just for settling payments as well as for looking at the chain state to see that their data is still stored correctly.
On the storage provider side, there are many more interactions with the chain. Storage providers are the ones that have to go to the chain and prove that they are still storing data. When they're wanting to store new pieces of data, they have to submit proof for that. Part of the consensus also consists of mining blocks and so they have to participate in that block mining process as well.
Can you explain the difference between Lotus and Curio?
Lukasz Magiera: There are multiple things. Lotus is Lotus and Lotus Miner. Lotus itself is the chain node. Lotus Miner is the miner software that storage providers initially ran. Lotus Miner is now in maintenance mode and Curio is replacing it. Lotus itself is not going away. It is still something that is used by everyone interacting with the Filecoin blockchain.
Lotus Miner is the software that initially implemented all of the Filecoin storage services. When we were developing the Lotus node software, we needed software to test it with – if you’re building blockchain software, then you need some software that mines the blocks according to the rules. This was the genesis of Lotus Miner.
Initially it was all just baked into a single process called Lotus. We then separated that functionality into a separate binary and called it Lotus Miner. The initial version, however, had limited scalability. When we got the first testnet up in 2019, several SPs wanted to scale their workloads to more than just a single computer and so we added this hacky mode to it so it could attach external workers. This was back in the days where sealing a single sector took 20 hours and so there was no consideration of data throughput. Plus scheduling wasn't really an issue because everything would take multiple hours.
Over time, the process got much better and much faster and SPs brought way more hardware online and so we had to rewrite it to make it better and more reliable. We didn't have all that much time and the architecture wasn't all that great but we did a rewrite that was reasonable. It could attach workers and workers could transfer data between them. This is what we launched the Filecoin Mainnet with in 2020.
For a while thereafter, most of the effort was making sure the chain didn’t die and so very little of that effort was put towards making Lotus Miner better. At some point, though, we got the idea, “What if we just rewrite Lotus Miner from scratch – just throw everything away and start fresh.” And so this is how Curio came about. The very early designs started about two years ago. Curio launched in May 2024
Can you describe Curio in more detail?
Lukasz Magiera: Curio is a properly scalable, highly available implementation of Filecoin storage provider services. More broadly, it is a powerful scheduler coupled with this runtime that implements all of the Filecoin protocols needed for being a storage provider. The scheduler is really the thing that sets it apart from Lotus Miner.
It is really simple to implement any DePIN protocol where you have some process running on some set of machines where you need to interact with a blockchain. Basically, it lets us implement protocols like Proof of Data Possession (PDP) in a matter of a week or two. Really the first iteration of PDP took me a week to get running – it went from nothing to being able to store data and to properly submit proofs to the chain.
Is Curio then able to incorporate new and composable services for storage providers to provide?
Lukasz Magiera: Yes. The short version is that if you have a cluster, you just put this Curio thing on every machine – you configure it to connect with each other and the rest figures itself out. The storage provider shouldn't really worry about any of the scheduling stuff and any of the things that the software should do by itself.
If something doesn't work, you have proper alerts. You have a nice web UI where it is easy to see where the problems are. And obviously for the architecture, it is much more fault tolerant where you can essentially pull the plug on any machine and the cluster will still keep running and will just keep serving data to clients.
THE STATE OF DECENTRALIZED STORAGE
Where are we with decentralized storage? What are you and your team building near term and then what does it look like long term?
Lukasz Magiera: Where we are now with Filecoin is we have a fairly solid archival product. It could use some more enterprise features for the more enterprise clients – this is mostly about access control lists (ACLs) and the like.
We just shipped Proof of Data Possession and that should help the larger storage provider. Then you have a whole universe of things we can do with confidential computing as well as developing additional proofs – things that enterprise would want such as proof of data scrubbing, i.e. proving there’s been no bit rot. We can also use the confidential compute structure to do essentially full-blown compute on data. We could be the place to allow clients to execute code on their own data. And this could include GPU-type workloads fairly easily as far as I understand.
On the storage provider side right now, you have to be fairly large. You have to have a data center with a lot of racks in a data center to get any scale and make a profit. You need a whole bunch of JBODs (Just a Bunch Of Disks) with at least a petabyte of raw storage or more to make the hardware pay for itself. You need hardware with GPUs and very expensive servers for creating sectors. The process is much better now with Curio, but we can still do better.
The short term plan from Curio is to establish markets for services so that people don't have to have so many GPUs for processing zk-snarks [for the current form of storage proofs]. Instead, let them buy services for zk-snark processing and sector building so they don’t need these very expensive sealing servers. Where all they really need is just hardware and decent Internet access – and for that they maybe don’t even need a data center at all. And so the short term is enabling smaller scale pieces of the protocol.
Can you talk about these changes in the processing flow for storing data?
Lukasz Magiera: Essentially, we have to separate the heavy CapEx processing involved in creating sectors for storage [from the process of storing the data]. You have storage providers who are okay with putting up more money, but they don't want to lock up their rewards. They just want to sell compute.
This kind of provider would just host GPUs and would sell zk-snarks. Or they would host SuperSeal machines and either create hard drives with sectors that they would ship or they would just send those sectors over the Internet.
Separately you have storage providers who want to host data and just have Internet access and hard drives. The experience here should really just resemble normal proof-of-work mining. You just plug boxes into the network and they just work whether it's with creating sectors or offering up hard drives.
Can you talk more about the new proofs that have been released or that you’re thinking about?
Lukasz Magiera: Sure. Proof of Data Possession (PDP) recently shipped and this is where storage providers essentially host clear text data for clients. The cool thing with PDP is that on a wider level, it means that clients can upload much smaller pieces of data to an SP and we can finally build proper data aggregation services where clients can essentially rent storage that can read, write, maybe even modify data on a storage provider. When they want to take a snapshot of their data, they just make a deal and store it and so this simple proof becomes a much better foundation for better storage products.
We could also fairly easily do attestations around proving there has been no bit rot. We could also do attestations for some data retrieval parameters as well as some data access speeds. Attestations for data transfer over the Internet might be possible but they are also very hard. These things could conceivably feed into an automated allocator so that storage deal-making could happen much more easily.
Some proofs appear easy to do but in reality are hard problems to solve. What are your thoughts here?
Lukasz Magiera: Proof of data deletion is one of those that is mathematically impossible to prove but it is much more possible to achieve within secure enclaves. And so if you assume that a) the client can run code, b) that it can trust the enclave, c) that the provider cannot see or interact with that code, and d) that there is a confidential way to interact with the hard drive – then essentially a data owner could rent a part of a hard drive or an SSD through a secure enclave in a way that only they can read and write to that storage.
It wouldn't be perfect. Storage providers could still throttle access or turn off the servers, but in theory, the secure enclave VM would ensure that read and write operations to the drive are secure and it would ensure that the data goes to the client. So it essentially would be possible to build something where the clients would be able to get some level of attestation of data deletion because essentially the SP wouldn't be able to read the data anyways. In this case you literally have to trust hardware, there's no way around it. You have to trust the CPU and you have to trust the firmware on the drive.
And so as Filecoin grows and adds new protocols and new proofs, can these be included in the network much more easily now because of Curio’s architecture?
Lukasz Magiera: Yes, essentially it is just a platform for us to ship capabilities to storage providers very quickly. If there are L2s that require custom software, it's possible then to work with these L2s to ship their runtimes directly inside of Curio. If an SP wants to participate in some L2s network, then they just check another checkbox or maybe put in some lines of configuration and immediately they could start making money without even ever having to install or move around any hardware.
CLOSING THOUGHTS
Any final thoughts or issues you’d like to talk about?
Lukasz Magiera: Yes, I would love to talk about the whole client side of things with Filecoin and other DePIN networks. Curio is really good on the storage provider side of things but I feel like we don’t have a Curio-style project on the client side yet. There are a bunch of clients but I don't think any of them go far enough in rethinking what the client experience could be on Filecoin and DePIN in general.
Most of the clients are still stuck in the more base level ways of interacting with the Filecoin network. You put files into sectors and track them but it feels like we could do much better. Even basic things like support for erasure coding properly would be a pretty big win. Having the ability to have 10% overhead but have some redundancy. All of this coupled with just better, more scalable software.
As a large client, I would just want to put a virtual appliance in my environment and have it be also highly available and be able to talk to Filecoin through a more familiar interface that is much more like object storage.