Decentralized AI Society and Web3 Organizations Join DSA
In a mark of how the AI industry’s insatiable appetite for data is fueling interest in innovative, dynamic, and cost-effective approaches to securely...
10 min read
Ken Fromm : Oct 25, 2024
Stuart Berman, DSA member and Filecoin OG, explores the evolution of data storage and the technical and practical promise of decentralized solutions.
Let’s start with a bit about your background in tech.
Stuart Berman: My work history has been largely in the enterprise space. I started off in the nineties selling computers, first of all at a retail level and then I got into IBM and I was a network engineer for their customers. We would go in as a hired gun to support any initiatives under whatever the customer wanted.
As part of a job, I was sent to Michigan to work on a customer project, Steelcase, which had lost their entire network and stopped manufacturing for several days. I ended up staying around there for 25 years because I guess they liked me.
My network engineering work there quickly turned into a security role because at the time – this was pre 9/11 – the network teams were doing things like firewall configuration, identity management, and antivirus and intrusion protection. And so, I went from managing infrastructure to understanding and countering all the new security risks that were arising.
How and when did you get involved in decentralized storage including IPSF and Filecoin?
Stuart Berman: In 2020, I heard about this crazy Filecoin project where people were configuring their machines to participate in an upcoming Space Race that Filecoin was sponsoring. This looked really interesting to me and so I got involved even before the Space Race launched.
I figured I had no chance at winning any of the top prizes because I only had a modest investment of used hardware but if I got involved early in the community I thought it might pay some dividends somehow. I was very pleasantly surprised – as were others mind you – that I was able to score a decent portion of FIL rewards in the Space Race itself.
During this period, the Filecoin community became tight-knit very quickly. Storage providers would continually help each other out. We had a lot of challenges and so we spent a lot of time not just fixing our own systems but also sharing and helping to fix other systems. I created a number of documents which helped new storage providers get their systems up and running. And I became known for being somebody you could reach out to and get plain spoken advice on more than just the technical side.
I became a member in the Miner X program, which was a testing organization for evaluating new network and storage configuration releases prior to rollout. As a result of this work, I was offered a CTO position in 2021 at PIKNIK which was joining the community to store commercial data. At PIKNIK I got involved on the customer side of things in terms of talking to customers about the value of Filecoin as well as Web3, where it's going and what decentralized storage is about.
I also created an accelerator program program called ESPA, which stands for Enterprise Storage Provider Accelerator, with a goal to bring new people into the community who could build out large scale storage providers. We knew that the Filecoin community wanted to onboard a lot of data – but you can't onboard the data unless you have storage providers ready to go. And so we were cycling a couple dozen new storage providers through every quarter, many of them quite large. We worked to connect them to a customer base.
We naturally felt like the most reasonable approach would be to approach enterprise customers with a solution that a) would not be more costly than traditional methods and b) give them an additional benefit of provable data that was regularly authenticated. There were other types of customers out there, especially ones with open data sets, that were meant to be widely available and shared, and did not have as stringent confidentiality and/or privacy needs.
Can you talk about the evolution of data storage, given your background in Web2 and now Web3?
Stuart Berman: Sure, I'll give you a quick overview. If we go back a bit to the enterprise, they typically were building large internal storage systems that were very, very expensive. Typically, you'd hear of enterprises talking about SANS, which stood for Storage Area Networks. These were extremely expensive systems that they stood up on their own premises.
Once the cloud and Web2 came into existence, then there was this shift to exploring whether companies could store their data securely in the cloud. Relatively quickly (in enterprise years that is), companies moved their data to cloud storage providers such as Google, Amazon, Microsoft, Box, and a few others. They offered interesting tools that were promised to ease the shift to cloud storage and help with the management of stored data.
And at first glance, the initial pricing for cloud storage was phenomenal. It was much cheaper than traditional SAN storage but soon you had to look at various tradeoffs. You had to balance between issues like 1) how fast do I need to retrieve the data versus 2) how cheaply can I store the data? This became really the typical situation for most enterprises.
Then, two things happened. One was storage got more expensive than people realized. The cloud in general – even though it was advertised as pay for what you use – was costing enterprises and others a lot more money than they anticipated because it was very difficult to manage. You had so many systems stored on the cloud that it was just difficult to unplug things because you didn’t know what was needed and what could be eliminated. It was expensive and difficult to track who was using what and so the costs went out of control.
The second is a more recent issue and that is centralized storage providers started controlling the type of data and the access to data in ways that might be arbitrary and/or pose a risk to businesses. If they (the storage providers) felt like the type of data you wanted to store was not appropriate for their environment, they could disallow or limit storage. They could shut a company or network down because of the nature of their business, and that could be anything from cannabis to medical research to legal but suggestive media content, to who knows what. There is a growing concern [when using centralized storage] about whether you could get locked out of your own data because you violated a policy.
And so this is why people are beginning to take a real interest in decentralization.
Let’s explore the case for decentralized storage a bit more. How do you define it and explain the benefits?
Stuart Berman: When we talk about decentralization, we mean who controls the data? Who has the ability to decide how that data is being used and what kind of data can be put up? In a decentralized network, it's the customer or the end user that is supposed to have that level of control. It should not be the data storage provider. You own your own data and you have self-custody of it. As long as you agree to whatever terms are out there – that is, that you're not going to store any illegal content – then we can store your data.
In the most extreme version of this approach – when you're highly decentralized – then there is no single point of failure. That means that if one entity goes away, the system keeps working because there is no central point of control.
For instance, Protocol Labs, the group behind the creation of IPFS and Filecoin – if they were to simply shut down, it doesn't mean that all your data would be lost. Your data is still out there, it's still functional, the blockchain's still running. There's no need for somebody or some company to control everything. There's no one who's controlling prices. There's no central authority for access. It's all done as a function of the rules of the blockchain, the physical infrastructure, and the decentralized network.
You get to decide when and where and how your data should be used or stored – provided it’s legal content, of course. This means, by the way, that if you're choosing to use a decentralized system, you should also have it distributed or replicated around the world so that you’re not the mercy of any given nation. You do have to put some thinking into it but, at the same time, that's the way the system is designed – to not have a single point of failure and not get shut down by the whims of others.
Can you talk a bit more about the technology – about the types of cryptographic proofs – and why that provides advantages over current data storage?
Stuart Berman: So today, the claim to fame of both IPFS and Filecoin is the proof of immutability of the storage. You can prove that the data that has been stored has not been changed. If for some reason the data is tampered with or if there's corruption – in the tech world, we call it bit rot – then that data is no longer what you expected to have been stored and it's no longer even retrievable. You couldn't retrieve it if you wanted to because it's lost its immutability.
So proof of immutability was the center of the initial release with these systems. They use a very clever system called zero knowledge proofs, which allows for a very quick regular audit every day to verify that stored data has not been altered in any way.
As a storage provider, you really don't have any direct interest in the data. You just want to make sure you protect the data. This is why you put up collateral in Filecoin, to tell the user, the customer, I promise to guarantee your data. If you don't, you lose any collateral that you've put up as a bond or as a guarantee of that. This is a great feature of the system. A lot of customers and potential customers like the idea that they don't have to worry whether their data is protected, that it won't be just lost.
If you can imagine going to a regular web2-type service, whether it's Dropbox, Google, or whatever, and you go to get your data and it turns out, “Uh, oh, it's no longer valid” or it's corrupted, you can’t play it, or it's not even there anymore – but you don't discover it until you try to retrieve it. This is where Filecoin excels in that you can actually determine daily that your data is still there and it's still healthy.
What does it mean to "trust the system" ?
Stuart Berman: We call the decentralized networks we’re building “trustless.” Not because you can’t trust them but because you can trust the system itself and the algorithms – these cryptographic proofs – but you don't have to trust people to take care of your records for you. You give permission, a transaction takes place, and then you and everyone else can trust the result because it's been verified and validated by everybody on the network. The code is open source and the results are auditable and so you don't have to trust that somebody will do what they say they're going to do..
Can you give us a view of the future of storage?
Stuart Berman: In the IT enterprise space, any new viable technology normally takes out 15 years to be adopted. I think we're on that type of adoption curve but there are other factors.
I remember when there was a big push, 10 years or so ago, to digitize medical records. What they forgot when they put the legislation in place was that they didn't standardize on any protocols. So you have all these systems out there that became digitized but they couldn’t talk to each other. And then businesses came about where their sole purpose was to allow one system to talk to another system because these records didn't have a standardized method of talking to each other.
And so it's more complicated than just having the data. It will take a large systems-type approach but we have to get there. We can't afford the clumsy, chaotic systems. Eventually it'll go away. I just wish it would happen quicker so that we don't feel like we're dragging our feet.
If we can dream on into the future, I would have some kind of digital identity that would allow me to turn access on and off on demand. Let's say I want to be a provider that stores your medical records. Wouldn’t it be great if you could securely store them and grant access to the doctors and hospitals and service providers that need access. You grant that access as long as they need access and then you could turn that access off when you no longer want to give them access. You could have your whole life's history, your whole medical records in your own control.
I don't control my own medical records today. Some various doctors throughout my life have filed them and lost them most likely. In fact, they probably deleted them after so many years of me not being a patient. But that doesn't suit me, right? My medical history is important for my entire life.
And even today, if I need to figure out if I ever had chickenpox, I have to call my mother. But there is, or there was, a medical record someplace at some point. There is or was a list of all the medical care I’ve received. With a system like Filecoin, it's possible to create this form of shared ownership. We just have to kind of complete the construction that would serve this sort of purpose.
What are the challenges ahead for decentralized storage?
Stuart Berman: One of the things that is a real inhibitor right now is a form of access controls especially in terms of proof of access. There are many people that are very concerned about what should be private and what can be public.
There are certainly common ways to protect data like encryption and other traditional forms but wouldn't it be interesting to know who has accessed it, whether it's an internal user at your organization or if you grant access to the public, how many times, how much is being accessed?
Not only is it an audit log you would expect to get from a hyperscaler, but it can also be put onchain, so that it is immutable. There will be a record that can't change of when data was put on the network – which is what we have today – and when people accessed it and then, ideally, when it was maybe removed from the network if you're going to delete the data.
So that's the full life cycle that many people are interested in. How do we account for the whole data life cycle of when I onboarded my data, how it's being used, what's the frequency, what's the popularity. And finally when it's no longer there because we've decided it doesn't have to be there anymore, right?
I believe this is where the next big challenge is for the network. We have the whole process of storing data reliably down. We know these cryptographic proofs work. But there are two sides of the coin here. You've got to be able to not just store it but also control access to it. This should always be controlled by the end user. This is the challenge I would lay before you and say, “How do we provide access control along with this proof of access?”
I believe the two – access controls and proof of access – go hand in hand so that people can be assured that their data is being used the way they intended to be used. This is the next major milestone that certainly people I've talked to in the enterprise space and in the business world want to see and, to a large degree, expect.
What are you doing most recently?
Stuart Berman: I'm working with clients in the Web3 world – particularly with end users who want to get value out of decentralized storage. I’m able to articulate the value of the technology in a less technical matter so that they can understand it and get their hands around it.
Are you available for strategic consulting or helping out with data storage projects people may have?
Stuart Berman: Yes, I certainly love to talk to people about these things. Whether it becomes a formal engagement is another story in terms of availability and terms. I have no intention of being an employee anywhere anymore. I'm past that stage but I enjoy doing the things I do. If I find a project worth doing, I'll put my time and energy into it.
Contact Stuart or follow his work at:
LinkedIn: https://www.linkedin.com/in/stuart-berman-b7274/
Filecoin Slack: @stuberman - f01278
In a mark of how the AI industry’s insatiable appetite for data is fueling interest in innovative, dynamic, and cost-effective approaches to securely...
Step into the world of cryptography and protocol design with Irene Giacomelli, Protocol Researcher at FilOz, as she shares how she bridges theory and...
Stuart Berman, DSA member and Filecoin OG, explores the evolution of data storage and the technical and practical promise of decentralized solutions.