In the four years since the first alpha version of MultiChain, hundreds (if not thousands) of proof-of-concept and pilot projects have been built by our partners on the platform. While many of the early ones were pointless blockchains, over time we have seen a consistent rise in the proportion of projects using the technology appropriately. Now we rarely hear about a blockchain-based application which lacks a good answer to the question: “Why not just use a regular database?” What a relief!
Proof-of-concepts and pilots are all well and good, but to my mind, the most important signal comes from solid enterprise blockchain projects that make it to live production. To be clear, this means networks containing multiple blockchain nodes belonging to multiple parties, where more than one of these parties is involved in generating real transactions and participating in the blockchain’s consensus algorithm. Without these characteristics, the blockchain is providing little or no value compared to a centralized database.
This article is a survey of ten of the most interesting permissioned blockchain applications built on MultiChain that are in production today. Each application will be described briefly, along with an explanation of why it made sense to use a blockchain and some numbers to give a sense of scale. Note that confidentiality agreements prevent us from revealing some of these projects’ details, but we’re telling you as much as we can. After reviewing the ten projects, I’ll finish with a list of five important lessons that I believe we can learn.
Ready? Then let’s begin…
Blockchain #1: SAP for Pharmaceuticals
Some medicines bought by large customers such as hospitals don’t end up being used, and are returned to wholesalers unopened for resale elsewhere. However this process brings a significant risk of counterfeiting, where the so-called “returns” have been faked somewhere along the way. To help combat this problem, every box of drugs can be shipped with a barcode label that identifies its contents and origin, with the barcode being recorded in a database for future verification. But who should be responsible for managing this critical database of drug shipment barcodes? In Europe, a centralized EU-level body was set up for this purpose, but there is no corresponding governmental entity in the USA.
To solve this dilemma, SAP built a blockchain-based solution on top of MultiChain, where multiple drug manufacturers and wholesalers have their own node, granting them direct access for reading and writing the chain. Each barcode is recorded as an item in a MultiChain data stream, allowing it to be looked up directly by scanning a printed label. The system is already running live and has been successfully tested to scale to 1.5 billion recorded barcodes and 30 million verifications per year.
Blockchain #2: TruBudget
When donor countries finance public projects in developing countries, it’s vital to keep track of important events in each project’s lifecycle, including tenders, contracts and disbursements. Both the donors and recipients want to maintain these records in a database for easy searching, but who should be in charge of that database? Neither side in the relationship is politically comfortable with ceding full control to the other, so this has often led to both parties maintaining their own records, and trying to keep them in sync. The picture is complicated further when there are multiple donor countries partnering together.
TruBudget is an open source application which uses a MultiChain blockchain to solve this dilemma. Each of the important stakeholders maintains its own node, writing important events to streams while sharing an identical picture of the project’s progress through their own front end. The system was commissioned by Germany’s Federal Ministry for Economic Cooperation and Development and developed by Accenture and KfW, Germany’s third largest bank. Two blockchains are now running in production for projects in Brazil and Burkina Faso respectively, with each expected to record up to 300 projects and 5,000 events per project.
Blockchain #3: Connected Health
In order to improve patient care and reduce bureaucracy, an Indian state government is implementing an electronic medical record system to enable information sharing between hospitals and other healthcare facilities in the state. When designing the system, two particular concerns arose. First, how can the records be secured against loss or tampering? Second, how do we ensure that the information is available locally in each city, in the event of a temporary loss of Internet connectivity?
These requirements were solved together by building the system on a blockchain rather than a centralized database. MultiChain streams are being used to store the medical records – currently with text only but with richer data such as images to be integrated later on. Participating cities will have their own nodes running locally, which take part in the consensus process. The system was built by RapidQube and is already in early production, with around 2 million records stored for over 50,000 people.
Blockchain #4: Collateralizing livestock
In many developing countries, farmers find it difficult to access affordable loans, even if they own valuable assets such as cattle that could serve as collateral. In order for a farmer’s cow to be used in this way, it must be identified and tagged, immunized against diseases and insured against potential mishaps. In addition, each cow can only be collateralized once. All this requires extensive data coordination between a country’s animal healthcare system, insurance companies and financial institutions, each of which has different incentives and governance structures.
FarmTrek is a blockchain-based solution developed by InfoCorp which enables this coordination to take place without being controlled by a central party. Each major stakeholder runs one or mode MultiChain nodes which work together to store and secure the data written to streams. Each cow is physically tagged with a tamper-proof NFC (near field communications) device, which connects to an Android mobile application used by the farmer to sign transactions and publish them to the blockchain. The project is now in live production in Myanmar and expected to scale to 100,000 farmers within two years, with an additional pilot in the works in Rwanda.
Blockchain #5: Tagcash KYC
As in many countries, when somebody opens a new bank account in the Philippines, the bank must perform rigorous KYC (know your customer) checks to verify the customer’s identity and residence. This costs time and money, meaning that banks and other financial service providers would benefit by sharing KYC information through a single database. Once built, this database can also form the basis of a credit scoring system, by adding information about customer loans and repayments (or failures thereof). Unfortunately, the Philippines has no centralized KYC and credit scoring mechanism, so this integration has been difficult to achieve.
In order to address this problem, Tagcash has created a blockchain-based KYC and credit scoring solution, using a network of nodes belonging to banks and smaller fintech companies. Some nodes have write privileges while others are permitted to read only. The information is stored within MultiChain streams, using a hash of each person’s name and birth date as a unique key for identifying their data. With the initial roll-out, around 100 records are being written per day, and this is expected to grow to 10,000/day over time.
Blockchain #6: Bureau Veritas Origin
With increasing awareness of food supply chain scandals, interest has grown in giving consumers greater transparency into how their food is sourced, processed, transported and stored. The goal is to create a comprehensive record of the steps involved in preparing an item for sale, and to enable consumers to access this information directly. To increase transparency and prevent tampering or corruption, it is preferable not to centralize control of this database at any individual company or location.
Bureau Veritas, a global company focused on testing and certification, has partnered with Atos Worldline to develop Origin, a blockchain-based food traceability solution. Nodes are run by multiple companies within the food supply chain, with data written to streams in a proprietary binary format. The finished products are labelled with QR codes, which consumers can scan in order to browse a web-based summary. With the initial roll-out, up to 100 records are being written per day.
(To avoid a common fallacy, it should be emphasized that the sources of data still need to be trusted when using a blockchain. The chain only improves the security of that data once it is stored.)
Blockchain #7: ILSBlockchain
An insurance linked security (ILS) is a bond which enables an insurance policy to be covered collectively by a group of investors. For example, the owners of a ship could pay a premium to the holders of an ILS, but if catastrophe strikes and the ship sinks, those holders lose some or all of their original investment. As with any financial asset, digitizing ILS ownership allows sales and transfers to take place efficiently. This is traditionally achieved using a custodian such as Euroclear, but the cost can be prohibitive for smaller insurance policies in the $10-20 million value range.
This problem was solved by Solidum Partners who issue and track ILS bonds on a MultiChain blockchain, removing the need for a highly regulated centralized custodian. Each bond is issued as a MultiChain asset, with participants transferring and exchanging these assets on a peer-to-peer basis. Nodes are run by the bond trustee, investors and reinsurers, with the consensus generated by a small group of senior participants. So far, four bonds have been issued on the blockchain, for over $50 million in total value.
Blockchain #8: Air Quality Chain
When it comes to collecting environmental data, three particular challenges need to be addressed. First, each type of data is generated in a different location, due to the need for specialized equipment. Second, the data must be stored safely and reliably for the very long term, to enable trends and changes to be analyzed. And third, different types of data may need to be cross-referenced in real time, to create a full picture of anomalies at the moment they occur.
These requirements can be addressed together by using blockchain. The Air Quality Chain project, implemented by Baumann, aggregates data on levels of ozone, radiation and air quality in Austria, using a network of nodes which collect data from multiple sources. Raw data is written directly to MultiChain streams, and so automatically replicated to all of the nodes in the network, which collectively ensure that it cannot be lost or modified. The system is running in production and collecting 2.7 million records annually, containing around 4 GB of raw data.
Blockchain #9: Deepshore Archive
Metro Group, the world’s fourth largest retailer, is required to archive all point-of-sale data for internal and external auditing purposes. Whereas Metro used to rely on a single vendor for this purpose, they recently migrated to a more flexible model, where the data can be redundantly stored on a number of different cloud providers. This gives them much greater freedom and the ongoing ability to negotiate over pricing.
However, this fragmentation presents a challenge in ensuring that all of the data is stored correctly and cannot be changed. To solve this problem, Metro have deployed a blockchain-based system, built by Deepshore, where a hash and some other metadata for each data set is stored in MultiChain streams for verification purposes. Multiple nodes are running in different subsidiaries and locations within the Metro Group, so even though this is an “internal blockchain”, control is effectively decentralized within a vast organization. The system is already running live and notarizing approximately 9 million datasets per day.
Blockchain #10: Fantastec SWAP
Growing up in the 1980s in the UK, collecting football stickers was hugely popular. We spent our pocket money on packets of random stickers, containing the faces of players, team photos and badges, and swapped obsessively with each other in at attempt to complete each year’s album. Fantastec has now developed a digital equivalent, where users download the SWAP app and purchase limited edition “cards”, complete with player videos and interactive statistics. Naturally, this application needs some database to keep track of card ownership, but it wasn’t clear where this database should be hosted. On the one hand, each participating football club should maintain its own database, to guarantee the authenticity and rarity of its issued cards. On the other hand, much of the product’s value derives from the ability to swap cards that were issued by different clubs.
This dilemma was solved by building the system on a blockchain, where each club has its own node that issues its digital collectibles as MultiChain assets, all of which are tracked together on a chain that is managed by consensus. The system, which makes extensive use of MultiChain’s built-in atomic exchange functionality, was built by Fantastec with assistance from partners such as PricewaterhouseCoopers. SWAP was recently launched with three big-name partners: Real Madrid, Arsenal and Borussia Dortmund. After less than 3 months it has grown to 15,000 users with over 250,000 collectibles issued.
Now that we’ve reviewed ten of the most interesting MultiChain-based networks in production, what can we learn from this group as a whole? What differentiates these projects from the hundreds and thousands of proofs-of-concept and pilots that never made it to the next stage?
Lesson #1: Focus on new applications
While there has been much talk about blockchains as an upgrade for existing systems, for now at least, we’re primarily seeing them deployed in new applications. I can think of two related reasons why this might be.
First, blockchains are still a new technology, and are perceived as more risky than centralized databases. This uncertainty can be tolerated when building new applications, which inevitably comes with some risk of failure. However, it makes blockchains less attractive for replacing something that is already known to work.
Second, any running centralized application must already have a trusted intermediary, who has presumably proven their reliability over time. While moving to a decentralized architecture might save money in bypassing this intermediary, this has to be weighed against the cost and risk of rebuilding the system from the ground up.
Lesson #2: Find a strong motive
Every application implemented on a blockchain must answer a crucial question: Why use a blockchain instead of a centralized database or file server? Blockchains will always be slower, less scalable and more complex than centralized systems, as a result of their fundamental design.
So if you have a suitable trusted intermediary who can host an application centrally, you should use it! The only reason to use a blockchain is if there is a strong motive to avoid this kind of centralization. In practice we see four main types of motive appear:
Commercial concerns. The participants in a network do not want to grant too much power to a competitor or some other central body, who could charge a lot for the service.
Regulatory requirements. Some regulation prevents the deployment of a centralized system, or would render it too expensive in terms of compliance.
Political risks. There is no place where the database could be hosted that would be politically acceptable to all of its users.
Secure replication. Multiple copies of the data need to be stored for redundancy, so using a blockchain provides the additional benefit of proven synchronization and tamper resistance.
Lesson #3: Think about data in general
Early discussions about enterprise blockchains were triggered by the rise of cryptocurrencies, in which the blockchain allows users to directly hold and transfer a virtual asset while preventing double spends. While some of the production networks we described (#7, #10) are using MultiChain in this way, the majority are doing something fundamentally different – building a decentralized architecture for storing and securing data.
Any database or file system, whether it is holding structured or unstructured data, could be implemented on a blockchain. Each piece of data can be stored in full on the chain, or notarized as a short on-chain hash (fingerprint) which serves to verify the data which is delivered off-chain. Unlike in asset use cases, there is no notion of ownership changing over time. The blockchain’s sole purpose is to enable some information to be stored and secured by a group, without relying on a central party.
In data-driven applications, “smart contracts” are the wrong transactional model, since they require every piece of data to be represented as a message sent to a contract, rather than being validated and then directly embedded (or hashed) in the chain. The central issue is the scale and speed with which information can be stored, indexed and retrieved.
Lesson #4: Look beyond “transformation”
For too long, the enterprise blockchain narrative has focused on buzzwords like “revolution” and “transformation”. But in reality, if we look at those blockchain projects that actually make it to production, only a few are doing things that would be impossible to achieve using more traditional technologies such as centralized databases, replication and point-to-point messaging. So what exactly is being transformed?
In most cases, a blockchain is being used simply because it’s the most appropriate and convenient tool for the job. It enables a new application to be easily built on top of a unified data store, while avoiding some concern about that store being centrally controlled. The blockchain provides additional robustness and tamper resistance, whose value outweighs the complexity and cost of running multiple nodes. While this all might seem rather unromantic, since when has enterprise IT been anything else?
But there is an additional, more subtle, part of the story. In rare cases we see projects being built on a blockchain, where there is no immediate justification for that choice. It turns out that the application’s users are happy for it to start out centralized, but want to keep their options open for the future. Using a blockchain (even with one node!) rather than a database allows the intermediary to be swapped or removed just by adding or removing nodes and changing some permissions. All this can happen with zero downtime and without touching the application’s code.
Lesson #5: Be very patient
With all of the noise surrounding blockchains, it’s easy to forget just how new this industry is. MultiChain, along with most other enterprise blockchain platforms, only reached a version 1.0 release in mid-to-late 2017 (it’s now at version 2.0.2). Since it’s quite common for enterprise IT projects, whether based on blockchains or not, to take two years from initiation to go live, it’s no surprise that the number of real blockchain networks in production is still rather small.
Indeed, two particular phenomena demonstrate just how early things are. First, we often find our partners performing the most basic tests on MultiChain just to convince themselves that it actually works! Second, we see some participants in production blockchain networks lacking the confidence to take responsibility for their own node, instead relying on some third party to host it on their behalf.
So as with any other new enterprise technology, people working in the blockchain space should hunker down for the very long term. I expect it will take another ten years before blockchains are commonly considered as an alternative for information system architectures, and another ten after that before they reach their full potential. By then, bandwidth, storage and cryptography will be so cheap and fast that it may seem quaint (if not ridiculous) for shared applications to store their data in only one place.
The next major version of MultiChain enters production
Today we’re thrilled to announce the production release of MultiChain 2.0, after 18 months of development and 3 months in beta testing. Version 2.0 of MultiChain is now available to download for Linux and Windows, and can also be compiled for Linux, Windows or Mac OS X.
MultiChain 2.0 adds smart filters for custom transaction or data validation rules, and a ton of enhancements for streams, including support for text and JSON data, multiple keys, off-chain data and richer querying. Other new features include blockchain parameter upgrading, per-asset and custom permissions, inline metadata and a binary cache for dealing with large pieces of data. Click for more details on all the new functionality.
For production systems based on MultiChain 2.0 we offer both Commercial Licenses (non-GPL) and Service Level Agreements (SLAs). For more information about these products and guidelines for when they are appropriate, see the pricing page or contact us.
We’re delighted to be speaking and exhibiting at Consensus 2019, the biggest annual blockchain conference taking place in New York from May 13-15. If you would like to attend but don’t yet have a ticket, you can register with the promo code MultiChain300 to receive a $300 discount. And if you’re using or evaluating MultiChain for a project and would like to meet us there in person, be sure to reach out and we’ll set aside some time.
Running MultiChain in production?
Our talk at Consensus will cover a number of the most innovative blockchain networks in production on MultiChain, and will also be written up as a post on our blog. So if you’ve built or are running a live application on MultiChain, and would like it publicized to thousands of blockchain enthusiasts and developers, please get in touch so we can find out more. To be clear, we’re specifically interested in projects that have moved beyond the proof-of-concept and pilot stages.
What’s next for MultiChain?
We’re already hard at work developing the Enterprise edition of MultiChain 2.0, which will include several high-end features required for enterprise applications, while maintaining full compatibility with the open source Community version 2.0. These features relate to confidentiality, scalability and regulatory requirements. More details will be released publicly in due course, but come to our booth at Consensus 2019 for a sneak preview!
Finally I’d like to take this opportunity to thank our development team, led by Dr Michael Rozantsev, for their ongoing dedication and engineering excellence. Almost five years ago to the day, we founded this company on the hypothesis that blockchains, as a technology, have useful applications beyond cryptocurrencies. It’s remarkable to see how far our product, and the entire blockchain industry, have come.
Empowering a broad new range of blockchain applications
Today we’re delighted to release the first beta version of MultiChain 2.0, the next generation of the MultiChain blockchain platform, after 16 months in development. MultiChain 2.0 (download) includes three major new areas of functionality to help developers rapidly build powerful blockchain applications:
Off-chain data. Any item published in a MultiChain stream can optionally be stored off-chain, in order to save bandwidth and storage space. Off-chain data (up to 1 GB per item) is automatically hashed into the blockchain, with the data itself delivered rapidly over the peer-to-peer network. Click for more about off-chain data.
Richer data streams. JSON and Unicode text are now supported natively and stored efficiently on- or off-chain. Multiple JSON items can be merged together, allowing a stream to serve as a database with a full audit history. Stream items can have multiple keys, and be queried by multiple keys and/or publishers together. Finally, to increase data throughput, a single transaction can publish multiple items to one or more streams.
In addition, MultiChain 2.0 provides several other smaller new features:
Blockchain upgrading. Many blockchain parameters can be changed over time, subject to administrator consensus. These include the block time interval, maximum block size, and many transaction size limits.
Per-asset permissions. Assets can optionally be issued with their own send and receive permissions, which can be controlled for each address by that asset’s issuer and/or its assigned administrators.
Binary cache. Large pieces of binary data (up to 1 GB) can be added to MultiChain over multiple API calls, or uploaded directly via the file system.
Inline metadata. Transaction outputs containing assets and/or native currency can now contain metadata in JSON, text or binary format. Smart Filters can easily read and respond to this metadata.
Custom permissions. Six new permissions (three “high” and three “low”) can be assigned to addresses by two levels of administrator. These are useful for defining roles enforced by Smart Filters.
We’re also delighted to welcome over 40 new companies to the MultiChain partner program, bringing the total number to 86. New members include SAP who have built a deep integration with MultiChain in the SAP Cloud Platform.
MultiChain 2.0 beta 1 can be downloaded here. It is backwards compatible with version 1.0 with a few exceptions – see the API compatibility note. MultiChain 1.0 nodes and networks can be upgraded to version 2.0 in the usual way (be sure to back up first). We’ll also continue to maintain and fix any bugs in MultiChain 1.0 through 2019 at least.
Below is the full official press release about the 2.0 beta release.
MultiChain Releases Beta Version 2.0 with Over Forty New Partners
December 19, 2018 – Coin Sciences Ltd is delighted to announce the first beta release of MultiChain 2.0, along with the addition of 43 new members of the MultiChain Partner Program, bringing the total number to 86.
MultiChain 2.0 beta 1 has been released after sixteen months of intensive development including seven alpha versions, and is available for Linux and Windows at: https://www.multichain.com/download-install/. Enhancements over MultiChain 1.0 include richer data publishing with support for JSON and Unicode text, blockchain parameter upgrading, seamless integration of off-chain data storage and delivery, and Smart Filters, MultiChain’s approach to the smart contract paradigm.
The new members of the MultiChain Partner Program include SAP, who have integrated MultiChain into the SAP Cloud Platform and are deploying it for client projects. HCL Technologies, the multinational consulting company, also recently joined, along with 41 other blockchain and software companies. Members of the partner program have access to the MultiChain engineering team, can use MultiChain branding in their marketing materials, and are promoted on the MultiChain website. A full list of MultiChain’s partners can be found at: https://www.multichain.com/platform-partners/
“At SAP we are extending business solutions with MultiChain blockchain functionality via our SAP Cloud Platform offering.” said Torsten Zube, SAP’s Head of Blockchain. Furthermore, “We strategically decided that MultiChain should be part of our offering due to its proven, easy and mature distributed ledger technology addressing enterprise needs. The upcoming MultiChain 2.0 release will provide more functionality such as Smart Filters and off-chain data that we see as particularly relevant for enterprise scenarios going forward.”
“Version 2.0 represents a huge upgrade for MultiChain, integrating several major features commonly requested by our developer community,” said Dr Gideon Greenspan, CEO and Founder of Coin Sciences Ltd. “With version 1.0 in stable production since August 2017, our goal with MultiChain 2.0 remains the same: to provide a powerful, stable and easy-to-use platform for blockchain application developers. We look forward to continued cooperation with our partners to bring MultiChain-driven applications to enterprises, governments and beyond.”
There’s more than one way to put code on a blockchain
In most discussions about blockchains, it doesn’t take long for the notion of “smart contracts” to come up. In the popular imagination, smart contracts automate the execution of interparty interactions, without requiring a trusted intermediary. By expressing legal relationships in code rather than words, they promise to enable transactions to take place directly and without error, whether deliberate or not.
From a technical viewpoint, a smart contract is something more specific: computer code that lives on a blockchain and defines the rules for that chain’s transactions. This description sounds simple enough, but behind it lies a great deal of variation in how these rules are expressed, executed and validated. When choosing a blockchain platform for a new application, the question “Does this platform support smart contracts?” isn’t the right one to ask. Instead, we need to be asking: “What type of smart contracts does this platform support?”
In this article, my goal is to examine some of the major differences between smart contract approaches and the trade-offs they represent. I’ll do this by looking at four popular enterprise blockchain platforms which support some form of customized on-chain code. First, IBM’s Hyperledger Fabric, which calls its contracts “chaincode”. Second, our MultiChain platform, which introduces smart filters in version 2.0. Third, Ethereum (and its permissioned Quorum and Burrow spin-offs), which popularized the “smart contract” name. And finally, R3 Corda, which references “contracts” in its transactions. Despite all of the different terminology, ultimately all of these refer to the same thing – application-specific code that defines the rules of a chain.
Before going any further, I should warn the reader that much of the following content is technical in nature, and assumes some familiarity with general programming and database concepts. For good or bad, this cannot be avoided – without getting into the details it’s impossible to make an informed decision about whether to use a blockchain for a particular project, and (if so) the right type of blockchain to use.
Let’s begin with some context. Imagine an application that is shared by multiple organizations, which is based on an underlying database. In a traditional centralized architecture, this database is hosted and administered by a single party which all of the participants trust, even if they do not trust each other. Transactions which modify the database are initiated only by applications on this central party’s systems, often in response to messages received from the participants. The database simply does what it’s told because the application is implicitly trusted to only send it transactions that make sense.
Blockchains provide an alternative way of managing a shared database, without a trusted intermediary. In a blockchain, each participant runs a “node” that holds a copy of the database and independently processes the transactions which modify it. Participants are identified using public keys or “addresses”, each of which has a corresponding private key known only to the identity owner. While transactions can be created by any node, they are “digitally signed” by their initiator’s private key in order to prove their origin.
Nodes connect to each other in a peer-to-peer fashion, rapidly propagating transactions and the “blocks” in which they are timestamped and confirmed across the network. The blockchain itself is literally a chain of these blocks, which forms an ordered log of every historical transaction. A “consensus algorithm” is used to ensure that all nodes reach agreement on the content of the blockchain, without requiring centralized control. (Note that some of this description does not apply to Corda, in which each node has only a partial copy of the database and there is no global blockchain. We’ll talk more about that later on.)
In principle, any shared database application can be architected by using a blockchain at its core. But doing so creates a number of technical challenges which do not exist in a centralized scenario:
Transaction rules. If any participant can directly change the database, how do we ensure that they follow the application’s rules? What stops one user from corrupting the database’s contents in a self-serving way?
Determinism. Once these rules are defined, they will be applied multiple times by multiple nodes when processing transactions for their own copy of the database. How do we ensure that every node obtains exactly the same result?
Conflict prevention. With no central coordination, how do we deal with two transactions that each follow the application’s rules, but nonetheless conflict with each other? Conflicts can stem from a deliberate attempt to game the system, or be the innocent result of bad luck and timing.
So where do smart contracts, smart filters and chaincode come in? Their core purpose is to work with a blockchain’s underlying infrastructure in order to solve these challenges. Smart contracts are the decentralized equivalent of application code – instead of running in one central place, they run on multiple nodes in the blockchain, creating or validating the transactions which modify that database’s contents.
Let’s begin with transaction rules, the first of these challenges, and see how they are expressed in Fabric, MultiChain, Ethereum and Corda respectively.
Transaction rules perform a specific function in blockchain-powered databases – restricting the transformations that can be performed on that database’s state. This is necessary because a blockchain’s transactions can be initiated by any of its participants, and these participants do not trust each other sufficiently to allow them to modify the database at will.
Let’s see two examples of why transaction rules are needed. First, imagine a blockchain designed to aggregate and timestamp PDF documents that are published by its participants. In this case, nobody should have the right to remove or change documents, since doing so would undermine the entire purpose of the system – document persistence. Second, consider a blockchain representing a shared financial ledger, which keeps track of the balances of its users. We cannot allow a participant to arbitrarily inflate their own balance, or take others’ money away.
Inputs and outputs
Our blockchain platforms rely on two broad approaches for expressing transaction rules. The first, which I call the “input–output model”, is used in MultiChain and Corda. Here, transactions explicitly list the database rows or “states” which they delete and create, forming a set of “inputs” and “outputs” respectively. Modifying a row is expressed as the equivalent operation of deleting that row and creating a new one in its place.
Since database rows are only deleted in inputs and only created in outputs, every input must “spend” a previous transaction’s output. The current state of the database is defined as the set of “unspent transaction outputs” or “UTXOs”, i.e. outputs from previous transactions which have not yet been used. Transactions may also contain additional information, called “metadata”, “commands” or “attachments”, which don’t become part of the database but help to define their meaning or purpose.
Given these three sets of inputs, outputs and metadata, the validity of a transaction in MultiChain or Corda is defined by some code which can perform arbitrary computations on those sets. This code can validate the transaction, or else return an error with a corresponding explanation. You can think of the input–output model as an automated “inspector” holding a checklist which ensures that transactions follow each and every rule. If the transaction fails any one of those checks, it will automatically be rejected by all of the nodes in the network.
It should be noted that, despite sharing the input–output model, MultiChain and Corda implement it very differently. In MultiChain, outputs can contain assets and/or data in JSON, text or binary format. The rules are defined in “transaction filters” or “stream filters”, which can be set to check all transactions, or only those involving particular assets or groupings of data. By contrast, a Corda output “state” is represented by an object in the Java or Kotlin programming language, with defined data fields. Corda’s rules are defined in “contracts” which are attached to specific states, and a state’s contract is only applied to transactions which contain that state in its inputs or outputs. This relates to Corda’s unusual visibility model, in which transactions can only be seen by their counterparties or those whose subsequent transactions they affect.
Contracts and messages
The second approach, which I call the “contract–message model”, is used in Hyperledger Fabric and Ethereum. Here, multiple “smart contracts” or “chaincodes” can be created on the blockchain, and each has its own database and associated code. A contract’s database can only be modified by its code, rather than directly by blockchain transactions. This design pattern is similar to the “encapsulation” of code and data in object-oriented programming.
With this model, a blockchain transaction begins as a message sent to a contract, with some optional parameters or data. The contract’s code is executed in reaction to the message and parameters, and is free to read and write its own database as part of that reaction. Contracts can also send messages to other contracts, but cannot access each other’s databases directly. In the language of relational databases, contracts act as enforced “stored procedures”, where all access to the database goes via some predefined code.
Both Fabric and Quorum, a variation on Ethereum, complicate this picture by allowing a network to define multiple “channels” or “private states”. The aim is to mitigate the problem of blockchain confidentiality by creating separate environments, each of which is only visible to a particular sub-group of participants. While this sounds promising in theory, in reality the contracts and data in each channel or private state are isolated from those in the others. As a result, in terms of smart contracts, these environments are equivalent to separate blockchains.
Let’s see how to implement the transaction rules for a single-asset financial ledger with these two models. Each row in our ledger’s database has two columns, containing the owner’s address and the quantity of the asset owned. In the input–output model, transactions must satisfy two conditions:
The total quantity of assets in a transaction’s outputs has to match the total in its inputs. This prevents users from creating or deleting money arbitrarily.
Every transaction has to be signed by the owner of each of its inputs. This stops users from spending each other’s money without permission.
Taken together, these two conditions are all that is needed to create a simple but viable financial system.
In the contract–message model, the asset’s contract supports a “send payment” message, which takes three parameters: the sender’s address, recipient’s address, and quantity to be sent. In response, the contract executes the following four steps:
Verify that the transaction was signed by the sender.
Check that the sender has sufficient funds.
Deduct the requested quantity from the sender’s row.
Add that quantity to the recipient’s row.
If either of the checks in the first two steps fails, the contract will abort and no payment will be made.
So both the input–output and contract–message models are effective ways to define transaction rules and keep a shared database safe. Indeed, on a theoretical level, each of these models can be used to simulate the other. In practice however, the most appropriate model will depend on the application being built. Does each transaction affect few or many pieces of information? Do we need to be able to guarantee transaction independence? Does each piece of data have a clear owner or is there some global state to be shared?
It is beyond our scope here to explore how the answers should influence a choice between these two models. But as a general guideline, when developing a new blockchain application, it’s worth trying to express its transaction rules in both forms, and seeing which fits more naturally. The difference will express itself in terms of: (a) ease of programming, (b) storage requirements and throughput, and (c) speed of conflict detection. We’ll talk more about this last issue later on.
When it comes to transaction rules, there is one way in which MultiChain specifically differs from Fabric, Ethereum and Corda. Unlike these other platforms, MultiChain has several built-in abstractions that provide some basic building blocks for blockchain-driven applications, without requiring developers to write their own code. These abstractions cover three areas that are commonly needed: (a) dynamic permissions, (b) transferrable assets, and (c) data storage.
For example, MultiChain manages permissions for connecting to the network, sending and receiving transactions, creating assets or streams, or controlling the permissions of other users. Multiple fungible assets can be issued, transferred, retired or exchanged safely and atomically. Any number of “streams” can be created on a chain, for publishing, indexing and retrieving on-chain or off-chain data in JSON, text or binary formats. All of the transaction rules for these abstractions are available out-of-the-box.
When developing an applications in MultiChain, it’s possible to ignore this built-in functionality, and express transaction rules using smart filters only. However, smart filters are designed to work together with its built-in abstractions, by enabling their default behavior to be restricted in customized ways. For example, the permission for certain activities might be controlled by specific administrators, rather than the default behavior where any administrator will do. The transfer of certain assets can be limited by time or require additional approval above a certain amount. The data in a particular stream can be validated to ensure that it consists only of JSON structures with required fields and values.
In all of these cases, smart filters create additional requirements for transactions to be validated, but do not remove the simple rules that are built in. This can help address one of the key challenges in blockchain applications: the fact that a bug in some on-chain code can lead to disastrous consequences. We’ve seen endless examples of this problem in the public Ethereum blockchain, most famously in the Demise of The DAO and the Parity multisignature bugs. Broader surveys have found a large number of common vulnerabilities in Ethereum smart contracts that enable attackers to steal or freeze other peoples’ funds.
Of course, MultiChain smart filters may contain bugs too, but their consequences are more limited in scope. For example, the built-in asset rules prevent one user from spending another’s money, or accidentally making their own money disappear, no matter what other logic a smart filter contains. If a bug is found in a smart filter, it can be deactivated and replaced with a corrected version, while the ledger’s basic integrity is protected. Philosophically, MultiChain is closer to traditional database architectures, where the database platform provides a number of built-in abstractions, such as columns, tables, indexes and constraints. More powerful features such as triggers and stored procedures can optionally be coded up by application developers, in cases where they are actually needed.
Permissions + assets + streams
Let’s move on to the next part of our showdown. No matter which approach we choose, the custom transaction rules of a blockchain application are expressed as computer code written by application developers. And unlike centralized applications, this code is going to be executed more than one time and in more than one place for each transaction. This is because multiple blockchain nodes belonging to different participants have to each verify and/or execute that transaction for themselves.
This repeated and redundant code execution introduces a new requirement that is rarely found in centralized applications: determinism. In the context of computation, determinism means that a piece of code will always give the same answer for the same parameters, no matter where and when it is run. This is absolutely crucial for code that interacts with a blockchain because, without determinism, the consensus between the nodes on that chain can catastrophically break down.
Let’s see how this looks in practice, first in the input–output model. If two nodes have a different opinion about whether a transaction is valid, then one will accept a block containing that transaction and the other will not. Since every block explicitly links back to a previous block, this will create a permanent “fork” in the network, with one or more nodes not accepting the majority opinion about the entire blockchain’s contents from that point on. The nodes in the minority will be cut off from the database’s evolving state, and will no longer be able to effectively use the application.
Now let’s see what happens if consensus breaks down in the contract–message model. If two nodes have a different opinion about how a contract should respond to a particular message, this can lead to a difference in their databases’ contents. This in turn can affect the contract’s response to future messages, including messages it sends to other contracts. The end result is an increasing divergence between different nodes’ view of the database’s state. (The “state root” field in Ethereum blocks ensures that any difference in contracts’ responses leads immediately to a fully catastrophic blockchain fork, rather than risking staying hidden for a period of time.)
Sources of non-determinism
So non-determinism in blockchain code is clearly a problem. But if the basic building blocks of computation, such as arithmetic, are deterministic, what do we have to worry about? Well, it turns out, quite a few things:
Most obviously, random number generators, since by definition these are designed to produce a different result every time.
Checking the current time, since nodes won’t be processing transactions at exactly the same time, and in any event their clocks may be out of sync. (It’s still possible to implement time-dependent rules by making reference to timestamps within the blockchain itself.)
Querying external resources such as the Internet, disk files, or other programs running on a computer. These resources cannot be guaranteed to always give the same response, and may become unavailable.
Running multiple pieces of code in parallel “threads”, since this leads to a “race condition” where the order in which these processes finish cannot be predicted.
Performing any floating point calculations which can give even minutely different answers on different computer processor architectures.
Our four blockchain platforms employ several different approaches to avoiding these pitfalls.
Determinism by endorsement
When it comes to determinism, Hyperledger Fabric adopts a completely different approach. In Fabric, when a “client” node wants to send a message to some chaincode, it first sends that message to some “endorser” nodes. Each of these nodes executes the chaincode independently, forming an opinion of the message’s effect on that chaincode’s database. These opinions are sent back to the client together with a digital signature which constitutes a formal “endorsement”. If the client receives enough endorsements of the intended outcome, it creates a transaction containing those endorsements, and broadcasts it for inclusion in the chain.
In order to guarantee determinism, each piece of chaincode has an “endorsement policy”..
By now it’s clear that many blockchain use cases have nothing to do with financial transactions. Instead, the chain’s purpose is to enable the decentralized aggregation, ordering, timestamping and archiving of any type of information, including structured data, correspondence or documentation. The blockchain’s core value is enabling its participants to provably and permanently agree on exactly what data was entered, when and by whom, without relying on a trusted intermediary. For example, SAP’s recently launched blockchain platform, which supports MultiChain and Hyperledger Fabric, targets a broad range of supply chain and other non-financial applications.
The simplest way to use a blockchain for recording data is to embed each piece of data directly inside a transaction. Every blockchain transaction is digitally signed by one or more parties, replicated to every node, ordered and timestamped by the chain’s consensus algorithm, and stored permanently in a tamper-proof way. Any data within the transaction will therefore be stored identically but independently by every node, along with a proof of who wrote it and when. The chain’s users are able to retrieve this information at any future time.
For example, MultiChain 1.0 allowed one or more named “streams” to be created on a blockchain and then used for storing and retrieving raw data. Each stream has its own set of write permissions, and each node can freely choose which streams to subscribe to. If a node is subscribed to a stream, it indexes that stream’s content in real-time, allowing items to be retrieved quickly based on their ordering, timestamp, block number or publisher address, as well as via a “key” (or label) by which items can be tagged. MultiChain 2.0 (since alpha 1) extended streams to support Unicode text or JSON data, as well as multiple keys per item and multiple items per transaction. It also added summarization functions such as “JSON merge” which combine items with the same key or publisher in a useful way.
Confidentiality and scalability
While storing data directly on a blockchain works well, it suffers from two key shortcomings – confidentiality and scalability. To begin with confidentiality, the content of every stream item is visible to every node on the chain, and this is not necessarily a desirable outcome. In many cases a piece of data should only be visible to a certain subset of nodes, even if other nodes are needed to help with its ordering, timestamping and notarization.
Confidentiality is a relatively easy problem to solve, by encrypting information before it is embedded in a transaction. The decryption key for each piece of data is only shared with those participants who are meant to see it. Key delivery can be performed on-chain using asymmetric cryptography (as described here) or via some off-chain mechanism, as is preferred. Any node lacking the key to decrypt an item will see nothing more than binary gibberish.
Scalability, on the other hand, is a more significant challenge. Let’s say that any decent blockchain platform should support a network throughput of 500 transactions per second. If the purpose of the chain is information storage, then the size of each transaction will depend primarily on how much data it contains. Each transaction will also need (at least) 100 bytes of overhead to store the sender’s address, digital signature and a few other bits and pieces.
If we take an easy case, where each item is a small JSON structure of 100 bytes, the overall data throughput would be 100 kilobytes per second, calculated from 500 × (100+100). This translates to under 1 megabit/second of bandwidth, which is comfortably within the capacity of any modern Internet connection. Data would accumulate at a rate of around 3 terabytes per year, which is no small amount. But with 12 terabyte hard drives now widely available, and RAID controllers which combine multiple physical drives into a single logical one, we could easily store 10-20 years of data on every node without too much hassle or expense.
However, things look very different if we’re storing larger pieces of information, such as scanned documentation. A reasonable quality JPEG scan of an A4 sheet of paper might be 500 kilobytes in size. Multiply this by 500 transactions per second, and we’re looking at a throughput of 250 megabytes per second. This translates to 2 gigabits/second of bandwidth, which is faster than most local networks, let alone connections to the Internet. At Amazon Web Services’ cheapest published price of $0.05 per gigabyte, it means an annual bandwidth bill of $400,000 per node. And where will each node store the 8000 terabytes of new data generated annually?
It’s clear that, for blockchain applications storing many large pieces of data, straightforward on-chain storage is not a practical choice. To add insult to injury, if data is encrypted to solve the problem of confidentiality, nodes are being asked to store a huge amount of information that they cannot even read. This is not an attractive proposition for the network’s participants.
The hashing solution
So how do we solve the problem of data scalability? How can we take advantage of the blockchain’s decentralized notarization of data, without replicating that data to every node on the chain?
The answer is with a clever piece of technology called a “hash”. A hash is a long number (think 256 bits, or around 80 decimal digits) which uniquely identifies a piece of data. The hash is calculated from the data using a one-way function which has an important cryptographic property: Given any piece of data, it is easy and fast to calculate its hash. But given a particular hash, it is computationally infeasible to find a piece of data that would generate that hash. And when we say “computationally infeasible”, we mean more calculations than there are atoms in the known universe.
Hashes play a crucial role in all blockchains, by uniquely identifying transactions and blocks. They also underlie the computational challenge in proof-of-work systems like bitcoin. Many different hash functions have been developed, with gobbledygook names like BLAKE2, MD5 and RIPEMD160. But in order for any hash function to be trusted, it must endure extensive academic review and testing. These tests come in the form of attempted attacks, such as “preimage” (finding an input with the given hash), “second preimage” (finding a second input with the same hash as the given input) and “collision” (finding any two different inputs with the same hash). Surviving this gauntlet is far from easy, with a long and tragic history of broken hash functions proving the famous maxim: “Don’t roll your own crypto.”
To go back to our original problem, we can solve data scalability in blockchains by embedding the hashes of large pieces of data within transactions, instead of the data itself. Each hash acts as a “commitment” to its input data, with the data itself being stored outside of the blockchain or “off-chain”. For example, using the popular SHA256 hash function, a 500 kilobyte JPEG image can be represented by a 32-byte number, a reduction of over 15,000×. Even at a rate of 500 images per second, this puts us comfortably back in the territory of feasible bandwidth and storage requirements, in terms of the data stored on the chain itself.
Of course, any blockchain participant that needs an off-chain image cannot reproduce it from its hash. But if the image can be retrieved in some other way, then the on-chain hash serves to confirm who created it and when. Just like regular on-chain data, the hash is embedded inside a digitally signed transaction, which was included in the chain by consensus. If an image file falls out of the sky, and the hash for that image matches a hash in the blockchain, then the origin and timestamp of that image is confirmed. So the blockchain is providing exactly the same value in terms of notarization as if the image was embedded in the chain directly.
A question of delivery
So far, so good. By embedding hashes in a blockchain instead of the original data, we have an easy solution to the problem of scalability. Nonetheless, one crucial question remains:
How do we deliver the original off-chain content to those nodes which need it, if not through the chain itself?
This question has several possible answers, and we know of MultiChain users applying them all. One basic approach is to set up a centralized repository at some trusted party, where all off-chain data is uploaded then subsequently retrieved. This system could naturally use “content addressing”, meaning that the hash of each piece of data serves directly as its identifier for retrieval. However, while this setup might work for a proof-of-concept, it doesn’t make sense for production, because the whole point of a blockchain is to remove trusted intermediaries. Even if on-chain hashes prevent the intermediary from falsifying data, it could still delete data or fail to deliver it to some participants, due to a technical failure or the actions of a rogue employee.
A more promising possibility is point-to-point communication, in which the node that requires some off-chain data requests it directly from the node that published it. This avoids relying on a trusted intermediary, but suffers from three alternative shortcomings:
It requires a map of blockchain addresses to IP addresses, to enable the consumer of some data to communicate directly with its publisher. Blockchains can generally avoid this type of static network configuration, which can be a problem in terms of failover and privacy.
If the original publisher node has left the network, or is temporarily out of service, then the data cannot be retrieved by anyone else.
If a large number of nodes are interested in some data, then the publisher will be overwhelmed by requests. This can create severe network congestion, slow the publisher’s system down, and lead to long delays for those trying to retrieve that data.
In order to avoid these problems, we’d ideally use some kind of decentralized delivery mechanism. Nodes should be able to retrieve the data they need without relying on any individual system – be it a centralized repository or the data’s original publisher. If multiple parties have a piece of data, they should share the burden of delivering it to anyone else who wants it. Nobody needs to trust an individual data source, because on-chain hashes can prove that data hasn’t been tampered with. If a malicious node delivers me the wrong data for a hash, I can simply discard that data and try asking someone else.
For those who have experience with peer-to-peer file sharing protocols such as Napster, Gnutella or BitTorrent, this will all sound very familiar. Indeed, many of the basic principles are the same, but there are two key differences. First, assuming we’re using our blockchain in an enterprise context, the system runs within a closed group of participants, rather than the Internet as a whole. Second, the blockchain adds a decentralized ordering, timestamping and notarization backbone, enabling all users to maintain a provably consistent and tamper-resistant view of exactly what happened, when and by whom.
How might a blockchain application developer achieve this decentralized delivery of off-chain content? One common choice is to take an existing peer-to-peer file sharing platform, such as the amusingly-named InterPlanetary File System (IPFS), and use it together with the blockchain. Each participant runs both a blockchain node and an IPFS node, with some middleware coordinating between the two. When publishing off-chain data, this middleware stores the original data in IPFS, then creates a blockchain transaction containing that data’s hash. To retrieve some off-chain data, the middleware extracts the hash from the blockchain, then uses this hash to fetch the content from IPFS. The local IPFS node automatically verifies the retrieved content against the hash to ensure it hasn’t been changed.
While this solution is possible, it’s all rather clumsy and inconvenient. First, every participant has to install, maintain and update three separate pieces of software (blockchain node, IPFS node and middleware), each of which stores its data in a separate place. Second, there will be two separate peer-to-peer networks, each with its own configuration, network ports, identity system and permissioning (although it should be noted that IPFS doesn’t yet support closed networks). Finally, tightly coupling IPFS and the blockchain together would make the middleware increasingly complex. For example, if we want the off-chain data referenced by some blockchain transactions to be instantly retrieved (with automatic retries), the middleware would need to be constantly up and running, maintaining its own complex state. Wouldn’t it be nice if the blockchain node did all of this for us?
Off-chain data in MultiChain 2.0
Today we’re delighted to release the third preview version (alpha 3) of MultiChain 2.0, with a fully integrated and seamless solution for off-chain data. Every piece of information published to a stream can be on-chain or off-chain as desired, and MultiChain takes care of everything else.
No really, we mean everything. As a developer building on MultiChain, you won’t have to worry about hashes, local storage, content discovery, decentralized delivery or data verification. Here’s what happens behind the scenes:
The publishing MultiChain node writes the new data in its local storage, slicing large items into chunks for easy digestion and delivery.
The transaction for publishing off-chain stream items is automatically built, containing the chunk hash(es) and size(s) in bytes.
This transaction is signed and broadcast to the network, propagating between nodes and entering the blockchain in the usual way.
When a node subscribed to a stream sees a reference to some off-chain data, it adds the chunk hashes for that data to its retrieval queue. (When subscribing to an old stream, a node also queues any previously published off-chain items for retrieval.)
As a background process, if there are chunks in a node’s retrieval queue, queries are sent out to the network to locate those chunks, as identified by their hashes.
These chunk queries are propagated to other nodes in the network in a peer-to-peer fashion (limited to two hops for now – see technical details below).
Any node which has the data for a chunk can respond, and this response is relayed to the subscriber back along the same path as the query.
If no node answers the chunk query, the chunk is returned back to the queue for later retrying.
Otherwise, the subscriber chooses the most promising source for a chunk (based on hops and response time), and sends it a request for that chunk’s data, again along the same peer-to-peer path as the previous response.
The source node delivers the data requested, using the same path again.
The subscriber verifies the data’s size and hash against the original request.
If everything checks out, the subscriber writes the data to its local storage, making it immediately available for retrieval via the stream APIs.
If the requested content did not arrive, or didn’t match the desired hash or size, the chunk is returned back to the queue for future retrieval from a different source.
Most importantly, all of this happens extremely quickly. In networks with low latency, small pieces of off-chain data will arrive at subscribers within a split second of the transaction that references them. And for high load applications, our testing shows that MultiChain 2.0 alpha 3 can sustain a rate of over 1000 off-chain items or 25 MB of off-chain data retrieved per second, on a mid-range server (Core i7) with a decent Internet connection. Everything works fine with off-chain items up to 1 GB in size, far beyond the 64 MB limit for on-chain data. Of course, we hope to improve these numbers further as we spend time optimizing MultiChain 2.0 during its beta phase.
When using off-chain rather than on-chain data in streams, MultiChain application developers have to do exactly two things:
When publishing data, pass an “offchain” flag to the appropriate APIs.
When using the stream querying APIs, consider the possibility that some off-chain data might not yet be available, as reported by the “available” flag. While this situation will be rare under normal circumstances, it’s important for application developers to handle it appropriately.
Of course, to prevent every node from retrieving every off-chain item, items should be grouped together into streams in an appropriate way, with each node subscribing to those streams of interest.
On-chain and off-chain items can be used within the same stream, and the various stream querying and summarization functions relate to both types of data identically. This allows publishers to make the appropriate choice for every item in a stream, without affecting the rest of an application. For example, a stream of JSON items about people’s activities might use off-chain data for personally identifying information, and on-chain data for the rest. Subscribers can use MultiChain’s JSON merging to combine both types of information into a single JSON for reading.
If you want to give off-chain stream items a try, just follow MultiChain’s regular Getting Started tutorial, and be sure not to skip section 5.
So what’s next?
With seamless support for off-chain data, MultiChain 2.0 will offer a big step forwards for blockchain applications focused on large scale data timestamping and notarization. In the longer term, we’re already thinking about a ton of possible future enhancements to this feature for the Community and/or Enterprise editions of MultiChain:
Implementing stream read permissions using a combination of off-chain items, salted hashes, signed chunk queries and encrypted delivery.
Allowing off-chain data to be explicitly “forgotten”, both voluntarily by individual nodes, or by all nodes in response to an on-chain message.
Selective stream subscriptions, in which nodes only retrieve the data for off-chain items with particular publishers or keys.
Using merkle trees to enable a single on-chain hash to represent an unlimited number of off-chain items, giving another huge jump in terms of scalability.
Pluggable storage engines, allowing off-chain data to be kept in databases or external file systems rather than local disk.
Nodes learning over time where each type of off-chain data is usually available in a network, and focusing their chunk queries appropriately.
We’d love to hear your feedback on the list above as well as off-chain items in general. With MultiChain 2.0 still officially in alpha, there’s plenty of time to enhance this feature before its final release.
In the meantime, we’ve already started work on “Smart Filters”, the last major feature planned for MultiChain 2.0 Community. A Smart Filter is a piece of code embedded in the blockchain which implements custom rules for validating data or transactions. Smart Filters have some similarities with “smart contracts”, and can do many of the same things, but have key differences in terms of safety and performance. We look forward to telling you more in due course.
While off-chain stream items in MultiChain 2.0 are simple to use, they contain many design decisions and additional features that may be of interest. The list below will mainly be relevant for developers building blockchain applications, and can be skipped by less technical types:
Per-stream policies. When a MultiChain stream is created, it can optionally be restricted to allow only on-chain or off-chain data. There are several possible reasons for doing this, rather than allowing each publisher to decide for themselves. For example, on-chain items offer an ironclad availability guarantee, whereas old off-chain items may become irretrievable if their publisher and other subscribers drop off the network. On the flip side, on-chain items cannot be “forgotten” without modifying the blockchain, while off-chain items are more flexible. This can be important in terms of data privacy rules, such as Europe’s new GDPR regulations.
On-chain metadata. For off-chain items, the on-chain transaction still contains the item’s publisher(s), key(s), format (JSON, text or binary) and total size. All this takes up very little space, and helps application developers determine whether the unavailability of an off-chain item is of concern for a particular stream query.
Two-hop limit. When relaying chunk queries across the peer-to-peer network, there is a trade-off between reachability and performance. While it would be nice for every query to be propagated along every single path, this can clog the network with unnecessary “chatter”. So for now chunk queries are limited to two hops, meaning that a node can retrieve off-chain data from any peer of its peers. In the smaller networks of under 1000 nodes that tend to characterize enterprise blockchains, we believe this will work just fine, but it’s easy for us to adjust this constraint (or offer it as a parameter) if we turn out to be wrong.
Local storage. Each MultiChain node stores off-chain data within the “chunks” directory of its regular blockchain directory, using an efficient binary format and LevelDB index. A separate subdirectory is used for the items in each of the subscribed streams, as well as those published by the node itself. Within each of these subdirectories, duplicate chunks (with the same hash) are only stored once. When a node unsubscribes from a stream, it can choose whether or not to purge the off-chain data retrieved for that stream.
Binary cache. When publishing large pieces of binary data, whether on-chain or off-chain, it may not be practical for application developers to send that data to MultiChain’s API in a single JSON-RPC request. So MultiChain 2.0 implements a binary cache, which enables large pieces of data to be built up over multiple API calls, and then published in a brief final step. Each item in the binary cache is stored as a simple file in the “cache” subdirectory of the blockchain directory, allowing gigabytes of data to also be pushed directly via the file system.
Monitoring APIs. MultiChain 2.0 alpha 3 adds two new APIs for monitoring the asynchronous retrieval of off-chain data. The first API describes the current state of the queue, showing how many chunks (and how much data) are waiting or being queried or retrieved. The second API provides aggregate statistics for all chunk queries and requests sent since the node started up, including counts of different types of failure.