Blockchain may be one of the most promising technologies today, but that may just as well be the reason why there's also a lot of FUD around it. Speculation and crypto-winter aside, however, there's a number of technology issues to address before blockchains can get real, and data access is prominent among them.
In a nutshell, blockchains are not very efficient as a data storage and retrieval mechanism. This is why people have been experimenting with various approaches to use blockchains as a database, including altering its structure.
Regardless of how successful these turn out to be, however, one thing is certain: Most of the world's data today does not live on a blockchain. The vast majority of application data live in some database, and some of that data may be accessed via APIs.
How, and why, would the world of databases and APIs talk to the world of blockchain? Enter Chainlink.
Smart contracts and the connectivity problem
You may have heard about smart contracts. You can think of smart contracts as programs that execute exactly as they are set up to by their creators on the Ethereum blockchain. Smart contracts enhance Ethereum with the ability to execute tamper-proof code, in addition to storing tamper-proof data, turning it to a "world computer."
Together, smart contracts and data form the building blocks for decentralized applications (Dapps) and even whole decentralized autonomous organizations (DAOs). There is a programming language (Solidity) used to develop smart contracts, as well as a development framework (Truffle) that can be used to build smart contract applications.
Despite the fact that this is still not a 100% mature stack, people are using it to develop Dapps and DAOs. Smart contracts can interact with each other, and they can also store and retrieve data on the blockchain. But what happens when they need to interact with the outside world, and retrieve (or store) data from/to databases or APIs?
The Smart Contract Connectivity Problem, as Chainlink defined it, is the inability of a smart contract to interact with any external data feed or other resource that is run outside the node network in which the smart contract itself is executed.
This lack of external connectivity is inherent to all smart contract networks, due to the method by which consensus is reached around blockchain transactions, and will therefore be an ongoing problem for all smart contract networks.
Chainlink, co-founded by CEO Sergey Nazarov and CTO Steve Ellis, aims to solve this problem by developing a so-called oracle, officially launching today. ZDNet connected with the Chainlink team to discuss what this is all about.
Chainlink, the blockchain oracle
An oracle is a gateway between a blockchain and the real world. Oracles can get data off the blockchain and pass it on to smart contracts. The problem with this, of course, is that oracles introduce the need for centralization and trust in the decentralized, trust-less world of blockchains.
Chainlink's whitepaper, published in 2017, tries to address this on the technical level. Part of Chainlink's implementation runs on-chain and part off-chain. There are provisions for Service Level Agreements (SLAs), mechanisms for data source selection, result aggregation, and reporting.
There is an API data providers can use to feed their data in Chainlink's oracle. There are also decentralization approaches and security services outlined, to ensure that Chainlink is robust and secure. One of the things we inquired about was how close today's launch is to the vision outlined in the Chainlink whitepaper [PDF].
The Chainlink team noted that the initial launch is focused on allowing smart contracts to retrieve external data from Chainlink nodes based on the number of individual requests they create. While this is an essential first step, it does not fully implement all of the features discussed in the white paper. Chainlink believes that's a process that can and should be gradually upgraded as development progresses.
In order to assist smart contract creators today, they went on to add, Chainlink provides documentation and contract examples on how to create requests to multiple oracles and aggregate responses. The Service Agreement Protocol, currently in development, will allow a requester to define parameters for their requests in a setup step, such that a single request can receive responses from multiple oracles.
In other words, there is a certain degree of technical forethought that has been built into this, although it's not fully implemented yet. Part of it is there to ensure the oracle is resilient (i.e. it does not crash under heavy load), and part of it to ensure it's decentralized (i.e. there's no single point of failure/arbiter of the truth).
Building an ecosystem
Chainlink is launching with three endorsed oracles, including its own. The other teams are Fiews and LinkPool. These teams have been running a Chainlink node on the Ethereum test networks for around a year, and have assisted with the development of the Chainlink node. Chainlink noted they will also have an on-boarding process for endorsed Chainlink nodes to be listed in official documentation.
Other third parties are able to run Chainlink nodes themselves, as Chainlink code is open source. Third parties may use other listing services (currently in development) in order to receive requests from smart contracts.
Any service provider can use Chainlink oracles for their smart contracts. If someone wants to use their own data for their smart contracts, they are free to connect to their own data source. Furthermore, the Chainlink team added, this depends on your perspective:
"As a data provider, how do I sell my data to smart contracts? The answer is to create an external adapter for my API, run a Chainlink node, and allow smart contracts to create requests to my oracle.
As a general node operator, how do I sell data for X API? They would either need to create an external adapter themselves, which may not be viable if they're not a developer (which is not a requirement), or they can find an open source implementation of an external adapter for the API they're wanting to provide.
We've built the Chainlink node to be modular by-design, so external adapters can easily be added by node operators to extend the functionality of their node without needing to know how to write programs."
The Chainlink ecosystem, today and tomorrow
Part of the value Chainlink brings is by providing the infrastructure for anyone to run an oracle, and part of it comes from its own oracle and ecosystem. There have been various names flying around, including a proof of concept project with SHIFT, and alleged "white label" partners such as Salesforce and Microsoft Azure.
The SWIFT proof of concept pulled interest rates from five banks (Barclays, BNP Paribas, Fidelity, Societe Generale, and Santander) and fed the data into a smart contract, which was used to make a payment that translated into a SWIFT payment message.
Chainlink clarified there are three types of projects that the ecosystem: Data providers, platforms/blockchains, and projects that use Chainlink oracles. Although Chainlink refrained from pointing to a comprehensive list, they pointed to a Decrypt article which mentions many collaborators and projects. There is a lot of speculation in the industry, they added, and they only confirm when official.
Chainlink provides more than the technical infrastructure here -- they also provide an instance of this infrastructure, with vetted data providers onboarded. Chainlink emphasized that they work with top data partners for officially created adapters such as crypto price data, supply chain, etc.
Essentially, there are two layers of selection there: One on the oracle network, and one within each oracle. Users can choose which oracle(s) to use in the oracle network, and oracle nodes can choose which external services to connect to.
This also poses some interesting technical challenges. Essentially, oracles will act as data hubs, with data flowing in and out of them. How will the different data providers and data streams be cataloged, integrated, and managed? And what about issues related to data freshness, correctness, and performance?
Data selection and schema matching
Chainlink currently operates with a schema system based on JSON Schema, to specify what inputs each adapter needs and how they should be formatted. Similarly, adapters specify an output schema to describe the format of each subtask's output.
Schema management at scale with data coming from various domains and sources is a sufficiently researched and documented topic, but that does not make it easy to deal with in practice. Especially when using JSON Schema, which is not the most advanced solution when it comes to schema management.
So what happens when there is no sufficient metadata on the data flowing through Chainlink? Not to mention, even sufficient metadata can be erroneous / misleading. What happens if i connect a data provider and claim it's about topic A, but others say it really is about topic B, or C, or D and E? Chainlink says this is where decentralization plays a key role in the oracle problem:
"Just like how smart contracts are secure because they're ran on multiple machines (blockchain nodes), you can secure the inputs to your smart contracts by having that input retrieved by multiple Chainlink nodes.
So if you're a requester, and you want data from a particular API DPA, you define how many Chainlink nodes you want to retrieve that data. To further decentralize your inputs, and if there are additional data providers with the same topic of data, you could have additional Chainlink nodes retrieve from another API DPB to assist with validation."
However, we would argue that while that does indeed address the topic of data source selection, it does not address that of schema matching: The terms used to describe what DPA and DPB is about could be different, and yet their data could be about the same thing. Based on JSON Schema, without a mechanism to align the metadata in place, nobody would ever know.
From a data architecture perspective, Chainlink looks like a data hub through which data will flow transiently. However, a published list of use cases, interacting with databases and data in the cloud is mentioned.
We wondered whether there are implementations of such use cases to show for today. Plus, if this takes off, the amount of data flowing through Chainlink will be considerable. Would Chainlink consider storing any of that data in the oracle, for example for caching?
Chainlink's view is that they like to think of it as an on-chain protocol that allows smart contracts and node operators to work with one another in a trust-minimized way:
"This means that any endpoint that a node operator can access can be used by a smart contract through our protocol. We have a number of working implementations that give smart contracts the ability to retrieve data from authenticated data sources.
Storing or caching data within the oracle is not currently a consideration since there are a number of security concerns associated with that. Data providers already have the facilities to store data long-term, and have the history and reliability of providing that data."
And what about the other way round? If a smart contract wants to send data to an external source, rather than store it on the blockchain, can Chainlink do this?
A Chainlink node can relay information from a smart contract to an external source. However, this would introduce an array of issues, as storing data in an external system means the tamper-proof aspect of data storage on the blockchain no longer applies.
So how will development for smart contracts on Chainlink look like? Does it come down to writing Solidity -- which is not the easiest thing in the world for most people? Currently smart contracts create their requests from on-chain, and that request is picked up by the Chainlink node.
In the near future, Chainlink said, they will allow for requests to be initiated from off-chain services directly to a Chainlink node. This allows for requests to be created faster than the typical block time of the Ethereum network.
It also opens the door for faster blockchains to receive data at their native speed. Chainlink nodes can already query data on other blockchains with external adapters, the only caveat is a requester would need to use a Chainlink node with connectivity to that blockchain.
All in all, this is a much welcome development for smart contracts, Ethereum, and blockchain at large. It means the next step in the evolution of this ecosystem is now possible.
Granted, not everything is rosy, and smart contract and oracle development is bound to hit some of the same issues tantalizing software development and data management for decades. Hopefully known solutions to those issues can eventually be applied to foster the growth of this ecosystem, too.