The Decoupling of Data


There are endless use cases for either blockchain or AI, but more important than any specific use case is a paradigm shift that rests at the confluence of both technologies: decentralized data ownership and the decoupling of data from the applications and services that generate it.

Where we are

Today the Internet is dominated by technology companies that have monopolies on our data. The likes of Amazon, Apple, Google, and Facebook are ruthlessly competing to be the stewards of users’ data. They suck up as much data as possible onto their permissioned platforms where they can dictate how it’s used and by who. Many burgeoning businesses have been killed off by changes in the self serving rules set by the powerful technology premiers of today, and consumers have little ability to migrate their data or restrict its use, sans deletion in some cases.

These walled data gardens extend beyond the realms of our cell phones and into our hospitals. Healthcare has a notoriously fragmented data landscape; despite billions of dollars spent by taxpayers simply getting your health record can be a nightmare, let alone getting it accurately on-boarded into a new provider’s systems. Spend time talking to healthcare practitioners and they’ll readily tell you this is isn’t a technical problem but instead one of the business incentives. The big players, both providers, and electronic health record vendors have no incentive to make their systems talk to each other. Dr. Harlan Krumholz of Yale University School of Medicinerecounted a stunning admission from a major healthcare leader that perfectly demonstrates this

“The leader of a very major healthcare system said this to me confidentially on the phone… ‘why would we want to make it easy for people to get their health data…we want to keep the patients with us so why wouldn’t we want to make it just a little more difficult for them to leave.’ …I couldn’t believe it a physician health care provider professional explaining to me the philosophy of that health system.”

Providers are viewing their reams of health records as a competitive asset, both in the sense that “data is the new oil,” and can be used to power new innovations, as well as a craven way to create stickiness with those that they are supposed to serve. In tandem, electronic health record vendors drag their feet as they battle it out for market share with each other.

The impact of this cannot be understated. It is devastating. Healthcare professionals rely on health records to make informed decisions. Without the right data, at the right time, there are worse chances of making the right decision. John Hopkins estimated that medical errors were the third leading cause of deaths in the United States. Still yet, that doesn’t capture the increased costs and decrease in quality of care that non-fatal episodes cause. Perhaps more insidiously, the status quo deprives a nascent data science industry of the fuel it needs to thrive. Troves of data sit unused, mostly the well connected get access to data, competition is stifled, and patients ultimately suffer.

Bundled data

A consequence of the status quo is that your data is bundled with the applications and services that generate it.

If I bought something on Amazon, that record would stay with Amazon. If I posted a status that appeared on your newsfeed, Facebook has that data. If I had a blood test done, that data resides with the provider that conducted the test. If I wanted to know more about my ancestry I would buy a 23andme ancestry kit, and my resulting genomic data would stay with 23andme.

In the example of 23andme, an important distinction is that you are paying for both your genomic data and the analysis of it. You cannot separate the two. I wouldn’t be able to get a 23andme ancestry report using my data, or vice versa. A highly savvy user could dig through 23andme, download their raw data, and upload it to something like Promothease, but that process is hardly scalable and requires a high level of technical sophistication. Further, Promothease is one of the few services that provides analysis of genomic data without generating that data itself.

23andme used to offer an API that enabled 3rd party developers to access consenting users’ genomic data, but in August 2018 they changed the rules and cut that access off. Instead, 3rd party apps will only be able to leverage the reports generated by 23andme, not the hard genomic data itself. No longer can you use 23andme’s API to offer to compete for analysis of a user’s genomic data. That is the consequence of data being bundled together with a service: because the company providing that service runs a centralized platform data is hosted on it can stifle competition.

Oh, and if you didn’t know already, 23andme is still leveraging their platform and your genomic data to land it hundreds of millions of dollars in investment from a pharmaceutical company.

It’s no wonder that against this backdrop of inequity, privacy concerns, and stifled competition people are turning towards decentralization. I’m using 23andme here as an example, but there are innumerable companies that do this today across all verticals.

A better way

Blockchain technology enables decentralized ownership of data, and as a consequence, unbundles data from applications and services. By unbundling data and putting users in control, users will be able to take their data anywhere and this will foster open competition. Decentralized data ownership means that no centralized entity owns the platform data is hosted on and that no centralized entity can change the rules to benefit themselves or stifle competition.

To draw again upon the genomic data example, users would be free to take their 23andme data and use it to get analysis from Helix, 23andme, Ancestry, or Promothease. None of these companies would be able to stop the others from providing their services on this level, decentralized playing field. In fact, they wouldn’t be able to stop anyone. Instead of having a single option for our genomic analysis we would have thousands of options. Companies and individuals alike would truly compete on providing the best algorithms, money, and data would flow to the best algorithms. That alone could generate mind-boggling benefits to users in the long term.

Companies and individuals alike would truly compete on providing the best algorithms. But, the benefits are not isolated only for users. With open competition between algorithms for users and their dollars, there exists a direct business model for algorithms. Instead of worrying about capturing a user base in a walled garden, or gaining insider access to a hospital’s data, if you can create a great algorithm you can directly monetize that by selling it to users. This opens up a realm of new possibilities by providing a powerful incentive to create algorithms.

Decentralized data ownership creates a direct business model for algorithms. The benefits of patient-owned health data have been written about at length, but in brief, there are similar benefits to patients and providers here as well. Patients would see better care and providers could focus on doing the things they do best and providing quality care. EHR vendors could focus on providing the best UI and tools on top of data.

Again, I’m using healthcare examples, but these concepts are broadly applicable outside of healthcare.


A number of technologies are maturing that could ensure users’ privacy is respected as well. In particular, homomorphic encryption, secure multi-party computation, and zero knowledge proofs could allow algorithms to train on users’ data without exposing the data itself. Zero knowledge proofs, in particular, are a favorite of the blockchain community. These technologies are in their infancy and will require significant R&D before they can be deployed at scale, however.


As algorithms become increasingly entrenched and important in our lives, it is imperative that we create structures that are equitable and robust. Today’s status quo is neither of those things. Technology titans have dominion over vast swathes of the Internet; they suck up consumer data, monetize it, and tweak the rules of their platforms to suppress competition for their benefit. I’ve used primarily healthcare examples in this article, but this is a problem in every industry.

Decentralized data ownership powered by blockchains change this paradigm for the better. By decoupling data from the applications and services that generate it, decentralized data ownership will broaden users’ choices, create open competition between algorithms, and creates a direct business model for algorithms. The end result will be an explosion of innovation, a more level playing field, and a massive benefit for consumers.