In recent times we’ve seen an uptick in interest for the potential of blockchain and AI as mutual force multipliers. With new year here, there are reasons to believe that 2020 could be when blockchain starts to genuinely transform the AI technology stack. Moreover, this could further broaden the trend towards fusing blockchain, AI and IoT as the three core elements of what some call “The New Stack”. Fundamentally, blockchain adds a trust layer to a system. In a time of increasing complexity, where credibility and transparency command higher premiums, this trust layer could be the “killer app” that further mainstreams blockchain, and tangibly propels us towards Industry 4.0

Start with Data Provenance

The provenance and lineage of data is likely where the blockchain trust layer is most immediately operationalized. Having an immutable and shared ledger lends itself well to tracing the origins of data back to original source. Visibility into data provenance lends further credibility to the data inputs which feed AI algorithms. Additionally, provenance doesn’t just refer to the logical and physical location of the origin data; the immutable blockchain audit trail can also provide change histories made to the data. For instance, did people or bots make changes to the data? What transformations did it go through? What was the source code that drove those transformations? What was the sequence in time for these events? In fact, when considering the scope of use-cases blockchain enables for visibility and traceability across the data pipeline, blockchain could eventually go from “force multiplier” to “central foundation” within the data engineering layers of the AI stack.

The Special Case of Knowledge Graphs

If Data Provenance is where we may see the biggest acceleration of blockchain penetration into AI, Knowledge Graphs could be a specific AI application where the synergy is particularly strong. Knowledge graphs aim to connect and fuse data siloes into a unified and useful view without physically moving and centralizing the data, much like blockchain connects decentralized entities into a single ledger without centralization. Therefore, while trusted data provenance is highly relevant to any AI application, with knowledge graphs, data provenance is arguably a core value proposition in and of itself. After all, the provenance and sourcing of data is a critical element that drives the semantic linking that’s needed for named entity recognition, semantic role labeling, or other applications of vector space models that knowledge graphs are so effective at surfacing. As such, Blockchain’s trust layer as an augmentation to knowledge graphs is a no-brainer in terms of data credibility and robustness. Moreover, there is the cost of purchasing, maintaining and growing datasets that blockchain can positively impact by orders of magnitude. Finally, for those knowledge graph implementations that aspire to connect data sources from multiple legal entities and dynamic data, blockchain is likely even more critical and could become basic table stakes. I’ll have more to say on blockchain and knowledge graphs over the course of 2020, so please be on the lookout!

Looking to the future: Richer data ecosystems and data as marketable asset (eventually….)

The “Holy grail” of eventual blockchain / AI fusion within industry 4.0 is one where the trusted shared data ledger leads to more actors contributing more volume and variety of data, driving better systemic outcomes for participants, which in turn drives deeper, wider and more diverse participation, bringing with them newer and more complete data sets,  improved data veracity , not to even mention how increased sharing AI’s algorithmic models leads to better models themselves, something we haven’t touched on in this post. So how does this entire system scale to these greater heights? By leveraging blockchain’s native digital asset exchange capabilities, which, incidentally, is perhaps the technology’s core characteristic that powered cryptocurrency adoption. Utilizing these tokenization-inspired capabilities, participants will be further inclined to contribute greater quantities and varieties of quality data through market incentives, while maintaining flexibility and granular permissions on how their data is used downstream. In other words, participants will get rewarded to contribute more data and better data whilst maintaining ownership and control. Depending upon how a network is setup, the rewards could come in the form of cryptocurrency, but could also be in digital fiat or straight fiat through smart contracts. Now, we may not see this ecosystem at scale in the next year, yet it’s good to be mindful of what opportunities lie further ahead after we build the foundational data provenance today.

As always, I’d love to hear your thoughts, so don’t be shy to reach out.

Happy 2020!
Anjon Roy
VP of Market Development