Empowering a broad new range of blockchain applications
Today we’re delighted to release the first beta version of MultiChain 2.0, the next generation of the MultiChain blockchain platform, after 16 months in development. MultiChain 2.0 (download) includes three major new areas of functionality to help developers rapidly build powerful blockchain applications:
Off-chain data. Any item published in a MultiChain stream can optionally be stored off-chain, in order to save bandwidth and storage space. Off-chain data (up to 1 GB per item) is automatically hashed into the blockchain, with the data itself delivered rapidly over the peer-to-peer network. Click for more about off-chain data.
Richer data streams. JSON and Unicode text are now supported natively and stored efficiently on- or off-chain. Multiple JSON items can be merged together, allowing a stream to serve as a database with a full audit history. Stream items can have multiple keys, and be queried by multiple keys and/or publishers together. Finally, to increase data throughput, a single transaction can publish multiple items to one or more streams.
In addition, MultiChain 2.0 provides several other smaller new features:
Blockchain upgrading. Many blockchain parameters can be changed over time, subject to administrator consensus. These include the block time interval, maximum block size, and many transaction size limits.
Per-asset permissions. Assets can optionally be issued with their own send and receive permissions, which can be controlled for each address by that asset’s issuer and/or its assigned administrators.
Binary cache. Large pieces of binary data (up to 1 GB) can be added to MultiChain over multiple API calls, or uploaded directly via the file system.
Inline metadata. Transaction outputs containing assets and/or native currency can now contain metadata in JSON, text or binary format. Smart Filters can easily read and respond to this metadata.
Custom permissions. Six new permissions (three “high” and three “low”) can be assigned to addresses by two levels of administrator. These are useful for defining roles enforced by Smart Filters.
We’re also delighted to welcome over 40 new companies to the MultiChain partner program, bringing the total number to 86. New members include SAP who have built a deep integration with MultiChain in the SAP Cloud Platform.
MultiChain 2.0 beta 1 can be downloaded here. It is backwards compatible with version 1.0 with a few exceptions – see the API compatibility note. MultiChain 1.0 nodes and networks can be upgraded to version 2.0 in the usual way (be sure to back up first). We’ll also continue to maintain and fix any bugs in MultiChain 1.0 through 2019 at least.
Below is the full official press release about the 2.0 beta release.
MultiChain Releases Beta Version 2.0 with Over Forty New Partners
December 19, 2018 – Coin Sciences Ltd is delighted to announce the first beta release of MultiChain 2.0, along with the addition of 43 new members of the MultiChain Partner Program, bringing the total number to 86.
MultiChain 2.0 beta 1 has been released after sixteen months of intensive development including seven alpha versions, and is available for Linux and Windows at: https://www.multichain.com/download-install/. Enhancements over MultiChain 1.0 include richer data publishing with support for JSON and Unicode text, blockchain parameter upgrading, seamless integration of off-chain data storage and delivery, and Smart Filters, MultiChain’s approach to the smart contract paradigm.
The new members of the MultiChain Partner Program include SAP, who have integrated MultiChain into the SAP Cloud Platform and are deploying it for client projects. HCL Technologies, the multinational consulting company, also recently joined, along with 41 other blockchain and software companies. Members of the partner program have access to the MultiChain engineering team, can use MultiChain branding in their marketing materials, and are promoted on the MultiChain website. A full list of MultiChain’s partners can be found at: https://www.multichain.com/platform-partners/
“At SAP we are extending business solutions with MultiChain blockchain functionality via our SAP Cloud Platform offering.” said Torsten Zube, SAP’s Head of Blockchain. Furthermore, “We strategically decided that MultiChain should be part of our offering due to its proven, easy and mature distributed ledger technology addressing enterprise needs. The upcoming MultiChain 2.0 release will provide more functionality such as Smart Filters and off-chain data that we see as particularly relevant for enterprise scenarios going forward.”
“Version 2.0 represents a huge upgrade for MultiChain, integrating several major features commonly requested by our developer community,” said Dr Gideon Greenspan, CEO and Founder of Coin Sciences Ltd. “With version 1.0 in stable production since August 2017, our goal with MultiChain 2.0 remains the same: to provide a powerful, stable and easy-to-use platform for blockchain application developers. We look forward to continued cooperation with our partners to bring MultiChain-driven applications to enterprises, governments and beyond.”
There’s more than one way to put code on a blockchain
In most discussions about blockchains, it doesn’t take long for the notion of “smart contracts” to come up. In the popular imagination, smart contracts automate the execution of interparty interactions, without requiring a trusted intermediary. By expressing legal relationships in code rather than words, they promise to enable transactions to take place directly and without error, whether deliberate or not.
From a technical viewpoint, a smart contract is something more specific: computer code that lives on a blockchain and defines the rules for that chain’s transactions. This description sounds simple enough, but behind it lies a great deal of variation in how these rules are expressed, executed and validated. When choosing a blockchain platform for a new application, the question “Does this platform support smart contracts?” isn’t the right one to ask. Instead, we need to be asking: “What type of smart contracts does this platform support?”
In this article, my goal is to examine some of the major differences between smart contract approaches and the trade-offs they represent. I’ll do this by looking at four popular enterprise blockchain platforms which support some form of customized on-chain code. First, IBM’s Hyperledger Fabric, which calls its contracts “chaincode”. Second, our MultiChain platform, which introduces smart filters in version 2.0. Third, Ethereum (and its permissioned Quorum and Burrow spin-offs), which popularized the “smart contract” name. And finally, R3 Corda, which references “contracts” in its transactions. Despite all of the different terminology, ultimately all of these refer to the same thing – application-specific code that defines the rules of a chain.
Before going any further, I should warn the reader that much of the following content is technical in nature, and assumes some familiarity with general programming and database concepts. For good or bad, this cannot be avoided – without getting into the details it’s impossible to make an informed decision about whether to use a blockchain for a particular project, and (if so) the right type of blockchain to use.
Let’s begin with some context. Imagine an application that is shared by multiple organizations, which is based on an underlying database. In a traditional centralized architecture, this database is hosted and administered by a single party which all of the participants trust, even if they do not trust each other. Transactions which modify the database are initiated only by applications on this central party’s systems, often in response to messages received from the participants. The database simply does what it’s told because the application is implicitly trusted to only send it transactions that make sense.
Blockchains provide an alternative way of managing a shared database, without a trusted intermediary. In a blockchain, each participant runs a “node” that holds a copy of the database and independently processes the transactions which modify it. Participants are identified using public keys or “addresses”, each of which has a corresponding private key known only to the identity owner. While transactions can be created by any node, they are “digitally signed” by their initiator’s private key in order to prove their origin.
Nodes connect to each other in a peer-to-peer fashion, rapidly propagating transactions and the “blocks” in which they are timestamped and confirmed across the network. The blockchain itself is literally a chain of these blocks, which forms an ordered log of every historical transaction. A “consensus algorithm” is used to ensure that all nodes reach agreement on the content of the blockchain, without requiring centralized control. (Note that some of this description does not apply to Corda, in which each node has only a partial copy of the database and there is no global blockchain. We’ll talk more about that later on.)
In principle, any shared database application can be architected by using a blockchain at its core. But doing so creates a number of technical challenges which do not exist in a centralized scenario:
Transaction rules. If any participant can directly change the database, how do we ensure that they follow the application’s rules? What stops one user from corrupting the database’s contents in a self-serving way?
Determinism. Once these rules are defined, they will be applied multiple times by multiple nodes when processing transactions for their own copy of the database. How do we ensure that every node obtains exactly the same result?
Conflict prevention. With no central coordination, how do we deal with two transactions that each follow the application’s rules, but nonetheless conflict with each other? Conflicts can stem from a deliberate attempt to game the system, or be the innocent result of bad luck and timing.
So where do smart contracts, smart filters and chaincode come in? Their core purpose is to work with a blockchain’s underlying infrastructure in order to solve these challenges. Smart contracts are the decentralized equivalent of application code – instead of running in one central place, they run on multiple nodes in the blockchain, creating or validating the transactions which modify that database’s contents.
Let’s begin with transaction rules, the first of these challenges, and see how they are expressed in Fabric, MultiChain, Ethereum and Corda respectively.
Transaction rules perform a specific function in blockchain-powered databases – restricting the transformations that can be performed on that database’s state. This is necessary because a blockchain’s transactions can be initiated by any of its participants, and these participants do not trust each other sufficiently to allow them to modify the database at will.
Let’s see two examples of why transaction rules are needed. First, imagine a blockchain designed to aggregate and timestamp PDF documents that are published by its participants. In this case, nobody should have the right to remove or change documents, since doing so would undermine the entire purpose of the system – document persistence. Second, consider a blockchain representing a shared financial ledger, which keeps track of the balances of its users. We cannot allow a participant to arbitrarily inflate their own balance, or take others’ money away.
Inputs and outputs
Our blockchain platforms rely on two broad approaches for expressing transaction rules. The first, which I call the “input–output model”, is used in MultiChain and Corda. Here, transactions explicitly list the database rows or “states” which they delete and create, forming a set of “inputs” and “outputs” respectively. Modifying a row is expressed as the equivalent operation of deleting that row and creating a new one in its place.
Since database rows are only deleted in inputs and only created in outputs, every input must “spend” a previous transaction’s output. The current state of the database is defined as the set of “unspent transaction outputs” or “UTXOs”, i.e. outputs from previous transactions which have not yet been used. Transactions may also contain additional information, called “metadata”, “commands” or “attachments”, which don’t become part of the database but help to define their meaning or purpose.
Given these three sets of inputs, outputs and metadata, the validity of a transaction in MultiChain or Corda is defined by some code which can perform arbitrary computations on those sets. This code can validate the transaction, or else return an error with a corresponding explanation. You can think of the input–output model as an automated “inspector” holding a checklist which ensures that transactions follow each and every rule. If the transaction fails any one of those checks, it will automatically be rejected by all of the nodes in the network.
It should be noted that, despite sharing the input–output model, MultiChain and Corda implement it very differently. In MultiChain, outputs can contain assets and/or data in JSON, text or binary format. The rules are defined in “transaction filters” or “stream filters”, which can be set to check all transactions, or only those involving particular assets or groupings of data. By contrast, a Corda output “state” is represented by an object in the Java or Kotlin programming language, with defined data fields. Corda’s rules are defined in “contracts” which are attached to specific states, and a state’s contract is only applied to transactions which contain that state in its inputs or outputs. This relates to Corda’s unusual visibility model, in which transactions can only be seen by their counterparties or those whose subsequent transactions they affect.
Contracts and messages
The second approach, which I call the “contract–message model”, is used in Hyperledger Fabric and Ethereum. Here, multiple “smart contracts” or “chaincodes” can be created on the blockchain, and each has its own database and associated code. A contract’s database can only be modified by its code, rather than directly by blockchain transactions. This design pattern is similar to the “encapsulation” of code and data in object-oriented programming.
With this model, a blockchain transaction begins as a message sent to a contract, with some optional parameters or data. The contract’s code is executed in reaction to the message and parameters, and is free to read and write its own database as part of that reaction. Contracts can also send messages to other contracts, but cannot access each other’s databases directly. In the language of relational databases, contracts act as enforced “stored procedures”, where all access to the database goes via some predefined code.
Both Fabric and Quorum, a variation on Ethereum, complicate this picture by allowing a network to define multiple “channels” or “private states”. The aim is to mitigate the problem of blockchain confidentiality by creating separate environments, each of which is only visible to a particular sub-group of participants. While this sounds promising in theory, in reality the contracts and data in each channel or private state are isolated from those in the others. As a result, in terms of smart contracts, these environments are equivalent to separate blockchains.
Let’s see how to implement the transaction rules for a single-asset financial ledger with these two models. Each row in our ledger’s database has two columns, containing the owner’s address and the quantity of the asset owned. In the input–output model, transactions must satisfy two conditions:
The total quantity of assets in a transaction’s outputs has to match the total in its inputs. This prevents users from creating or deleting money arbitrarily.
Every transaction has to be signed by the owner of each of its inputs. This stops users from spending each other’s money without permission.
Taken together, these two conditions are all that is needed to create a simple but viable financial system.
In the contract–message model, the asset’s contract supports a “send payment” message, which takes three parameters: the sender’s address, recipient’s address, and quantity to be sent. In response, the contract executes the following four steps:
Verify that the transaction was signed by the sender.
Check that the sender has sufficient funds.
Deduct the requested quantity from the sender’s row.
Add that quantity to the recipient’s row.
If either of the checks in the first two steps fails, the contract will abort and no payment will be made.
So both the input–output and contract–message models are effective ways to define transaction rules and keep a shared database safe. Indeed, on a theoretical level, each of these models can be used to simulate the other. In practice however, the most appropriate model will depend on the application being built. Does each transaction affect few or many pieces of information? Do we need to be able to guarantee transaction independence? Does each piece of data have a clear owner or is there some global state to be shared?
It is beyond our scope here to explore how the answers should influence a choice between these two models. But as a general guideline, when developing a new blockchain application, it’s worth trying to express its transaction rules in both forms, and seeing which fits more naturally. The difference will express itself in terms of: (a) ease of programming, (b) storage requirements and throughput, and (c) speed of conflict detection. We’ll talk more about this last issue later on.
When it comes to transaction rules, there is one way in which MultiChain specifically differs from Fabric, Ethereum and Corda. Unlike these other platforms, MultiChain has several built-in abstractions that provide some basic building blocks for blockchain-driven applications, without requiring developers to write their own code. These abstractions cover three areas that are commonly needed: (a) dynamic permissions, (b) transferrable assets, and (c) data storage.
For example, MultiChain manages permissions for connecting to the network, sending and receiving transactions, creating assets or streams, or controlling the permissions of other users. Multiple fungible assets can be issued, transferred, retired or exchanged safely and atomically. Any number of “streams” can be created on a chain, for publishing, indexing and retrieving on-chain or off-chain data in JSON, text or binary formats. All of the transaction rules for these abstractions are available out-of-the-box.
When developing an applications in MultiChain, it’s possible to ignore this built-in functionality, and express transaction rules using smart filters only. However, smart filters are designed to work together with its built-in abstractions, by enabling their default behavior to be restricted in customized ways. For example, the permission for certain activities might be controlled by specific administrators, rather than the default behavior where any administrator will do. The transfer of certain assets can be limited by time or require additional approval above a certain amount. The data in a particular stream can be validated to ensure that it consists only of JSON structures with required fields and values.
In all of these cases, smart filters create additional requirements for transactions to be validated, but do not remove the simple rules that are built in. This can help address one of the key challenges in blockchain applications: the fact that a bug in some on-chain code can lead to disastrous consequences. We’ve seen endless examples of this problem in the public Ethereum blockchain, most famously in the Demise of The DAO and the Parity multisignature bugs. Broader surveys have found a large number of common vulnerabilities in Ethereum smart contracts that enable attackers to steal or freeze other peoples’ funds.
Of course, MultiChain smart filters may contain bugs too, but their consequences are more limited in scope. For example, the built-in asset rules prevent one user from spending another’s money, or accidentally making their own money disappear, no matter what other logic a smart filter contains. If a bug is found in a smart filter, it can be deactivated and replaced with a corrected version, while the ledger’s basic integrity is protected. Philosophically, MultiChain is closer to traditional database architectures, where the database platform provides a number of built-in abstractions, such as columns, tables, indexes and constraints. More powerful features such as triggers and stored procedures can optionally be coded up by application developers, in cases where they are actually needed.
Permissions + assets + streams
Let’s move on to the next part of our showdown. No matter which approach we choose, the custom transaction rules of a blockchain application are expressed as computer code written by application developers. And unlike centralized applications, this code is going to be executed more than one time and in more than one place for each transaction. This is because multiple blockchain nodes belonging to different participants have to each verify and/or execute that transaction for themselves.
This repeated and redundant code execution introduces a new requirement that is rarely found in centralized applications: determinism. In the context of computation, determinism means that a piece of code will always give the same answer for the same parameters, no matter where and when it is run. This is absolutely crucial for code that interacts with a blockchain because, without determinism, the consensus between the nodes on that chain can catastrophically break down.
Let’s see how this looks in practice, first in the input–output model. If two nodes have a different opinion about whether a transaction is valid, then one will accept a block containing that transaction and the other will not. Since every block explicitly links back to a previous block, this will create a permanent “fork” in the network, with one or more nodes not accepting the majority opinion about the entire blockchain’s contents from that point on. The nodes in the minority will be cut off from the database’s evolving state, and will no longer be able to effectively use the application.
Now let’s see what happens if consensus breaks down in the contract–message model. If two nodes have a different opinion about how a contract should respond to a particular message, this can lead to a difference in their databases’ contents. This in turn can affect the contract’s response to future messages, including messages it sends to other contracts. The end result is an increasing divergence between different nodes’ view of the database’s state. (The “state root” field in Ethereum blocks ensures that any difference in contracts’ responses leads immediately to a fully catastrophic blockchain fork, rather than risking staying hidden for a period of time.)
Sources of non-determinism
So non-determinism in blockchain code is clearly a problem. But if the basic building blocks of computation, such as arithmetic, are deterministic, what do we have to worry about? Well, it turns out, quite a few things:
Most obviously, random number generators, since by definition these are designed to produce a different result every time.
Checking the current time, since nodes won’t be processing transactions at exactly the same time, and in any event their clocks may be out of sync. (It’s still possible to implement time-dependent rules by making reference to timestamps within the blockchain itself.)
Querying external resources such as the Internet, disk files, or other programs running on a computer. These resources cannot be guaranteed to always give the same response, and may become unavailable.
Running multiple pieces of code in parallel “threads”, since this leads to a “race condition” where the order in which these processes finish cannot be predicted.
Performing any floating point calculations which can give even minutely different answers on different computer processor architectures.
Our four blockchain platforms employ several different approaches to avoiding these pitfalls.
Determinism by endorsement
When it comes to determinism, Hyperledger Fabric adopts a completely different approach. In Fabric, when a “client” node wants to send a message to some chaincode, it first sends that message to some “endorser” nodes. Each of these nodes executes the chaincode independently, forming an opinion of the message’s effect on that chaincode’s database. These opinions are sent back to the client together with a digital signature which constitutes a formal “endorsement”. If the client receives enough endorsements of the intended outcome, it creates a transaction containing those endorsements, and broadcasts it for inclusion in the chain.
In order to guarantee determinism, each piece of chaincode has an “endorsement policy”..
By now it’s clear that many blockchain use cases have nothing to do with financial transactions. Instead, the chain’s purpose is to enable the decentralized aggregation, ordering, timestamping and archiving of any type of information, including structured data, correspondence or documentation. The blockchain’s core value is enabling its participants to provably and permanently agree on exactly what data was entered, when and by whom, without relying on a trusted intermediary. For example, SAP’s recently launched blockchain platform, which supports MultiChain and Hyperledger Fabric, targets a broad range of supply chain and other non-financial applications.
The simplest way to use a blockchain for recording data is to embed each piece of data directly inside a transaction. Every blockchain transaction is digitally signed by one or more parties, replicated to every node, ordered and timestamped by the chain’s consensus algorithm, and stored permanently in a tamper-proof way. Any data within the transaction will therefore be stored identically but independently by every node, along with a proof of who wrote it and when. The chain’s users are able to retrieve this information at any future time.
For example, MultiChain 1.0 allowed one or more named “streams” to be created on a blockchain and then used for storing and retrieving raw data. Each stream has its own set of write permissions, and each node can freely choose which streams to subscribe to. If a node is subscribed to a stream, it indexes that stream’s content in real-time, allowing items to be retrieved quickly based on their ordering, timestamp, block number or publisher address, as well as via a “key” (or label) by which items can be tagged. MultiChain 2.0 (since alpha 1) extended streams to support Unicode text or JSON data, as well as multiple keys per item and multiple items per transaction. It also added summarization functions such as “JSON merge” which combine items with the same key or publisher in a useful way.
Confidentiality and scalability
While storing data directly on a blockchain works well, it suffers from two key shortcomings – confidentiality and scalability. To begin with confidentiality, the content of every stream item is visible to every node on the chain, and this is not necessarily a desirable outcome. In many cases a piece of data should only be visible to a certain subset of nodes, even if other nodes are needed to help with its ordering, timestamping and notarization.
Confidentiality is a relatively easy problem to solve, by encrypting information before it is embedded in a transaction. The decryption key for each piece of data is only shared with those participants who are meant to see it. Key delivery can be performed on-chain using asymmetric cryptography (as described here) or via some off-chain mechanism, as is preferred. Any node lacking the key to decrypt an item will see nothing more than binary gibberish.
Scalability, on the other hand, is a more significant challenge. Let’s say that any decent blockchain platform should support a network throughput of 500 transactions per second. If the purpose of the chain is information storage, then the size of each transaction will depend primarily on how much data it contains. Each transaction will also need (at least) 100 bytes of overhead to store the sender’s address, digital signature and a few other bits and pieces.
If we take an easy case, where each item is a small JSON structure of 100 bytes, the overall data throughput would be 100 kilobytes per second, calculated from 500 × (100+100). This translates to under 1 megabit/second of bandwidth, which is comfortably within the capacity of any modern Internet connection. Data would accumulate at a rate of around 3 terabytes per year, which is no small amount. But with 12 terabyte hard drives now widely available, and RAID controllers which combine multiple physical drives into a single logical one, we could easily store 10-20 years of data on every node without too much hassle or expense.
However, things look very different if we’re storing larger pieces of information, such as scanned documentation. A reasonable quality JPEG scan of an A4 sheet of paper might be 500 kilobytes in size. Multiply this by 500 transactions per second, and we’re looking at a throughput of 250 megabytes per second. This translates to 2 gigabits/second of bandwidth, which is faster than most local networks, let alone connections to the Internet. At Amazon Web Services’ cheapest published price of $0.05 per gigabyte, it means an annual bandwidth bill of $400,000 per node. And where will each node store the 8000 terabytes of new data generated annually?
It’s clear that, for blockchain applications storing many large pieces of data, straightforward on-chain storage is not a practical choice. To add insult to injury, if data is encrypted to solve the problem of confidentiality, nodes are being asked to store a huge amount of information that they cannot even read. This is not an attractive proposition for the network’s participants.
The hashing solution
So how do we solve the problem of data scalability? How can we take advantage of the blockchain’s decentralized notarization of data, without replicating that data to every node on the chain?
The answer is with a clever piece of technology called a “hash”. A hash is a long number (think 256 bits, or around 80 decimal digits) which uniquely identifies a piece of data. The hash is calculated from the data using a one-way function which has an important cryptographic property: Given any piece of data, it is easy and fast to calculate its hash. But given a particular hash, it is computationally infeasible to find a piece of data that would generate that hash. And when we say “computationally infeasible”, we mean more calculations than there are atoms in the known universe.
Hashes play a crucial role in all blockchains, by uniquely identifying transactions and blocks. They also underlie the computational challenge in proof-of-work systems like bitcoin. Many different hash functions have been developed, with gobbledygook names like BLAKE2, MD5 and RIPEMD160. But in order for any hash function to be trusted, it must endure extensive academic review and testing. These tests come in the form of attempted attacks, such as “preimage” (finding an input with the given hash), “second preimage” (finding a second input with the same hash as the given input) and “collision” (finding any two different inputs with the same hash). Surviving this gauntlet is far from easy, with a long and tragic history of broken hash functions proving the famous maxim: “Don’t roll your own crypto.”
To go back to our original problem, we can solve data scalability in blockchains by embedding the hashes of large pieces of data within transactions, instead of the data itself. Each hash acts as a “commitment” to its input data, with the data itself being stored outside of the blockchain or “off-chain”. For example, using the popular SHA256 hash function, a 500 kilobyte JPEG image can be represented by a 32-byte number, a reduction of over 15,000×. Even at a rate of 500 images per second, this puts us comfortably back in the territory of feasible bandwidth and storage requirements, in terms of the data stored on the chain itself.
Of course, any blockchain participant that needs an off-chain image cannot reproduce it from its hash. But if the image can be retrieved in some other way, then the on-chain hash serves to confirm who created it and when. Just like regular on-chain data, the hash is embedded inside a digitally signed transaction, which was included in the chain by consensus. If an image file falls out of the sky, and the hash for that image matches a hash in the blockchain, then the origin and timestamp of that image is confirmed. So the blockchain is providing exactly the same value in terms of notarization as if the image was embedded in the chain directly.
A question of delivery
So far, so good. By embedding hashes in a blockchain instead of the original data, we have an easy solution to the problem of scalability. Nonetheless, one crucial question remains:
How do we deliver the original off-chain content to those nodes which need it, if not through the chain itself?
This question has several possible answers, and we know of MultiChain users applying them all. One basic approach is to set up a centralized repository at some trusted party, where all off-chain data is uploaded then subsequently retrieved. This system could naturally use “content addressing”, meaning that the hash of each piece of data serves directly as its identifier for retrieval. However, while this setup might work for a proof-of-concept, it doesn’t make sense for production, because the whole point of a blockchain is to remove trusted intermediaries. Even if on-chain hashes prevent the intermediary from falsifying data, it could still delete data or fail to deliver it to some participants, due to a technical failure or the actions of a rogue employee.
A more promising possibility is point-to-point communication, in which the node that requires some off-chain data requests it directly from the node that published it. This avoids relying on a trusted intermediary, but suffers from three alternative shortcomings:
It requires a map of blockchain addresses to IP addresses, to enable the consumer of some data to communicate directly with its publisher. Blockchains can generally avoid this type of static network configuration, which can be a problem in terms of failover and privacy.
If the original publisher node has left the network, or is temporarily out of service, then the data cannot be retrieved by anyone else.
If a large number of nodes are interested in some data, then the publisher will be overwhelmed by requests. This can create severe network congestion, slow the publisher’s system down, and lead to long delays for those trying to retrieve that data.
In order to avoid these problems, we’d ideally use some kind of decentralized delivery mechanism. Nodes should be able to retrieve the data they need without relying on any individual system – be it a centralized repository or the data’s original publisher. If multiple parties have a piece of data, they should share the burden of delivering it to anyone else who wants it. Nobody needs to trust an individual data source, because on-chain hashes can prove that data hasn’t been tampered with. If a malicious node delivers me the wrong data for a hash, I can simply discard that data and try asking someone else.
For those who have experience with peer-to-peer file sharing protocols such as Napster, Gnutella or BitTorrent, this will all sound very familiar. Indeed, many of the basic principles are the same, but there are two key differences. First, assuming we’re using our blockchain in an enterprise context, the system runs within a closed group of participants, rather than the Internet as a whole. Second, the blockchain adds a decentralized ordering, timestamping and notarization backbone, enabling all users to maintain a provably consistent and tamper-resistant view of exactly what happened, when and by whom.
How might a blockchain application developer achieve this decentralized delivery of off-chain content? One common choice is to take an existing peer-to-peer file sharing platform, such as the amusingly-named InterPlanetary File System (IPFS), and use it together with the blockchain. Each participant runs both a blockchain node and an IPFS node, with some middleware coordinating between the two. When publishing off-chain data, this middleware stores the original data in IPFS, then creates a blockchain transaction containing that data’s hash. To retrieve some off-chain data, the middleware extracts the hash from the blockchain, then uses this hash to fetch the content from IPFS. The local IPFS node automatically verifies the retrieved content against the hash to ensure it hasn’t been changed.
While this solution is possible, it’s all rather clumsy and inconvenient. First, every participant has to install, maintain and update three separate pieces of software (blockchain node, IPFS node and middleware), each of which stores its data in a separate place. Second, there will be two separate peer-to-peer networks, each with its own configuration, network ports, identity system and permissioning (although it should be noted that IPFS doesn’t yet support closed networks). Finally, tightly coupling IPFS and the blockchain together would make the middleware increasingly complex. For example, if we want the off-chain data referenced by some blockchain transactions to be instantly retrieved (with automatic retries), the middleware would need to be constantly up and running, maintaining its own complex state. Wouldn’t it be nice if the blockchain node did all of this for us?
Off-chain data in MultiChain 2.0
Today we’re delighted to release the third preview version (alpha 3) of MultiChain 2.0, with a fully integrated and seamless solution for off-chain data. Every piece of information published to a stream can be on-chain or off-chain as desired, and MultiChain takes care of everything else.
No really, we mean everything. As a developer building on MultiChain, you won’t have to worry about hashes, local storage, content discovery, decentralized delivery or data verification. Here’s what happens behind the scenes:
The publishing MultiChain node writes the new data in its local storage, slicing large items into chunks for easy digestion and delivery.
The transaction for publishing off-chain stream items is automatically built, containing the chunk hash(es) and size(s) in bytes.
This transaction is signed and broadcast to the network, propagating between nodes and entering the blockchain in the usual way.
When a node subscribed to a stream sees a reference to some off-chain data, it adds the chunk hashes for that data to its retrieval queue. (When subscribing to an old stream, a node also queues any previously published off-chain items for retrieval.)
As a background process, if there are chunks in a node’s retrieval queue, queries are sent out to the network to locate those chunks, as identified by their hashes.
These chunk queries are propagated to other nodes in the network in a peer-to-peer fashion (limited to two hops for now – see technical details below).
Any node which has the data for a chunk can respond, and this response is relayed to the subscriber back along the same path as the query.
If no node answers the chunk query, the chunk is returned back to the queue for later retrying.
Otherwise, the subscriber chooses the most promising source for a chunk (based on hops and response time), and sends it a request for that chunk’s data, again along the same peer-to-peer path as the previous response.
The source node delivers the data requested, using the same path again.
The subscriber verifies the data’s size and hash against the original request.
If everything checks out, the subscriber writes the data to its local storage, making it immediately available for retrieval via the stream APIs.
If the requested content did not arrive, or didn’t match the desired hash or size, the chunk is returned back to the queue for future retrieval from a different source.
Most importantly, all of this happens extremely quickly. In networks with low latency, small pieces of off-chain data will arrive at subscribers within a split second of the transaction that references them. And for high load applications, our testing shows that MultiChain 2.0 alpha 3 can sustain a rate of over 1000 off-chain items or 25 MB of off-chain data retrieved per second, on a mid-range server (Core i7) with a decent Internet connection. Everything works fine with off-chain items up to 1 GB in size, far beyond the 64 MB limit for on-chain data. Of course, we hope to improve these numbers further as we spend time optimizing MultiChain 2.0 during its beta phase.
When using off-chain rather than on-chain data in streams, MultiChain application developers have to do exactly two things:
When publishing data, pass an “offchain” flag to the appropriate APIs.
When using the stream querying APIs, consider the possibility that some off-chain data might not yet be available, as reported by the “available” flag. While this situation will be rare under normal circumstances, it’s important for application developers to handle it appropriately.
Of course, to prevent every node from retrieving every off-chain item, items should be grouped together into streams in an appropriate way, with each node subscribing to those streams of interest.
On-chain and off-chain items can be used within the same stream, and the various stream querying and summarization functions relate to both types of data identically. This allows publishers to make the appropriate choice for every item in a stream, without affecting the rest of an application. For example, a stream of JSON items about people’s activities might use off-chain data for personally identifying information, and on-chain data for the rest. Subscribers can use MultiChain’s JSON merging to combine both types of information into a single JSON for reading.
If you want to give off-chain stream items a try, just follow MultiChain’s regular Getting Started tutorial, and be sure not to skip section 5.
So what’s next?
With seamless support for off-chain data, MultiChain 2.0 will offer a big step forwards for blockchain applications focused on large scale data timestamping and notarization. In the longer term, we’re already thinking about a ton of possible future enhancements to this feature for the Community and/or Enterprise editions of MultiChain:
Implementing stream read permissions using a combination of off-chain items, salted hashes, signed chunk queries and encrypted delivery.
Allowing off-chain data to be explicitly “forgotten”, both voluntarily by individual nodes, or by all nodes in response to an on-chain message.
Selective stream subscriptions, in which nodes only retrieve the data for off-chain items with particular publishers or keys.
Using merkle trees to enable a single on-chain hash to represent an unlimited number of off-chain items, giving another huge jump in terms of scalability.
Pluggable storage engines, allowing off-chain data to be kept in databases or external file systems rather than local disk.
Nodes learning over time where each type of off-chain data is usually available in a network, and focusing their chunk queries appropriately.
We’d love to hear your feedback on the list above as well as off-chain items in general. With MultiChain 2.0 still officially in alpha, there’s plenty of time to enhance this feature before its final release.
In the meantime, we’ve already started work on “Smart Filters”, the last major feature planned for MultiChain 2.0 Community. A Smart Filter is a piece of code embedded in the blockchain which implements custom rules for validating data or transactions. Smart Filters have some similarities with “smart contracts”, and can do many of the same things, but have key differences in terms of safety and performance. We look forward to telling you more in due course.
While off-chain stream items in MultiChain 2.0 are simple to use, they contain many design decisions and additional features that may be of interest. The list below will mainly be relevant for developers building blockchain applications, and can be skipped by less technical types:
Per-stream policies. When a MultiChain stream is created, it can optionally be restricted to allow only on-chain or off-chain data. There are several possible reasons for doing this, rather than allowing each publisher to decide for themselves. For example, on-chain items offer an ironclad availability guarantee, whereas old off-chain items may become irretrievable if their publisher and other subscribers drop off the network. On the flip side, on-chain items cannot be “forgotten” without modifying the blockchain, while off-chain items are more flexible. This can be important in terms of data privacy rules, such as Europe’s new GDPR regulations.
On-chain metadata. For off-chain items, the on-chain transaction still contains the item’s publisher(s), key(s), format (JSON, text or binary) and total size. All this takes up very little space, and helps application developers determine whether the unavailability of an off-chain item is of concern for a particular stream query.
Two-hop limit. When relaying chunk queries across the peer-to-peer network, there is a trade-off between reachability and performance. While it would be nice for every query to be propagated along every single path, this can clog the network with unnecessary “chatter”. So for now chunk queries are limited to two hops, meaning that a node can retrieve off-chain data from any peer of its peers. In the smaller networks of under 1000 nodes that tend to characterize enterprise blockchains, we believe this will work just fine, but it’s easy for us to adjust this constraint (or offer it as a parameter) if we turn out to be wrong.
Local storage. Each MultiChain node stores off-chain data within the “chunks” directory of its regular blockchain directory, using an efficient binary format and LevelDB index. A separate subdirectory is used for the items in each of the subscribed streams, as well as those published by the node itself. Within each of these subdirectories, duplicate chunks (with the same hash) are only stored once. When a node unsubscribes from a stream, it can choose whether or not to purge the off-chain data retrieved for that stream.
Binary cache. When publishing large pieces of binary data, whether on-chain or off-chain, it may not be practical for application developers to send that data to MultiChain’s API in a single JSON-RPC request. So MultiChain 2.0 implements a binary cache, which enables large pieces of data to be built up over multiple API calls, and then published in a brief final step. Each item in the binary cache is stored as a simple file in the “cache” subdirectory of the blockchain directory, allowing gigabytes of data to also be pushed directly via the file system.
Monitoring APIs. MultiChain 2.0 alpha 3 adds two new APIs for monitoring the asynchronous retrieval of off-chain data. The first API describes the current state of the queue, showing how many chunks (and how much data) are waiting or being queried or retrieved. The second API provides aggregate statistics for all chunk queries and requests sent since the node started up, including counts of different types of failure.
As time goes on, the blockchain world has been separating into two distinct parts. On one hand, public blockchains with their associated cryptocurrencies have enjoyed a remarkable recent comeback, minting many a multi-millionaire. On the other hand, use of permissioned or enterprise blockchains has been growing quietly but steadily, seeing their first live deployments across multiple industries during 2017.
One interesting question to consider is the appropriate level of similarity between these two types of chain. Both implement a shared database using peer-to-peer networking, public–private key cryptography, transaction rules and consensus mechanisms that can survive malicious actors. That’s a great deal of common ground. Nonetheless, public and private blockchains have different requirements in terms of confidentiality, scalability and governance. Perhaps these differences point to the need for radically divergent designs.
The Corda platform, developed by the R3 banking consortium, adopts a clear stance on this question. While some aspects were inspired by public blockchains, Corda was designed from scratch based on the needs of R3’s members. Indeed, although R3 still uses the word “blockchain” extensively to help market their product, Corda has no chain of blocks at all. More than any other “distributed ledger” platform I’m aware of, Corda departs radically from the architecture of conventional blockchains.
My goal in this piece is to explain these differences and discuss their implications, for good and bad. Actually, good and bad is the wrong way to put it, because the more interesting question is “Good and bad for what?” This article is far from short. But by the end of it, I hope that readers will gain some understanding of the differences in Corda and their consequent trade-offs. Corda is important because its design decisions bring many of the dilemmas of enterprise blockchains into sharp relief.
One last thing before we dive in. As the CEO of the company behind MultiChain, a popular enterprise blockchain platform, why am I writing in such depth about a supposedly competing product? The standard reason would be to argue for MultiChain’s superiority, but that’s not my motivation here. In fact, I do not see Corda and MultiChain as competitors, because they are fundamentally different in terms of design, architecture and audience. Corda and MultiChain compete in the same way as cruise liners and jet skis – while both transport people by sea, there are almost no real-world situations in which both could be used.
On a more personal note, I’ve learned a great deal from Corda’s technical leadership over the past few years, whether through meetings, correspondence or their public writings, much of which occurred before they joined R3. Some of my interest in Corda stems from the respect I have for this team, and for this reason alone, Corda is worth studying for anyone seeking an understanding of the distributed ledger field.
In order to understand Corda, it’s helpful to start with conventional blockchains. The purpose of a blockchain is to enable a database or ledger to be directly and safely shared by non-trusting parties. This contrasts with centralized databases, which are stored and controlled by a single organization. A blockchain has multiple “nodes”, each of which stores a copy of the database and can belong to a different organization. Nodes connect to each other in a dense peer-to-peer fashion, using a “gossip protocol” in which each node is constantly telling its peers everything it learns. As a result, any node can rapidly broadcast a message to the entire network via many alternative paths.
A database, whether centralized or blockchain-powered, begins in an empty state, and is updated via “transactions”. A transaction is defined as a set of database changes which are “atomic”, meaning that they succeed or fail as a whole. Imagine a database representing a financial ledger, with one row per account. A transaction in which Alice pays $10 to Bob has three steps: (1) verify that Alice’s account contains at least $10, (2) subtract $10 from Alice’s account, and (3) add $10 to Bob’s account. As a basic requirement, any database platform must ensure that no transaction interferes with another. This “isolation” is achieved by locking the rows for both Alice and Bob while the payment is under way. Any other transaction involving these rows must wait until this one is finished.
In a blockchain, every node independently processes every transaction on its own copy of the database. Transactions are created anywhere on the network and automatically propagated to all other nodes. Since the organizations running nodes may have different (or even conflicting) interests, they cannot trust each other to transact fairly. Blockchains therefore need rules which define whether or not a particular transaction is valid. In a shared financial ledger, these rules prevent users from spending each other’s money, or conjuring funds from thin air.
Along with the rules that determine transaction validity, blockchains must also define how transactions will be ordered, since in many cases this ordering is critical. If Alice has $15 and tries to send $10 to both Bob and Charlie in two separate transactions, only one of these payments can succeed. While we might like to say that the first transaction takes precedence, a peer-to-peer network has no objective definition of “first”, since messages can arrive at different nodes in different orders.
In a general sense, the information in any database is separated into records or “rows”, and a transaction can do three different things: delete rows, create rows, and/or modify rows. These can be reduced further to two, since modifying a row is equivalent to deleting that row and creating a new one in its place. To go back to Alice’s payment to Bob, her row containing $15 is deleted, and two new rows are created – one containing $10 for Bob and the other with $5 in “change” for Alice.
Following bitcoin’s and Corda’s terminology, we denote the rows deleted by a transaction as its “inputs”, and those created as its “outputs”. Any row deleted by a transaction must have been created by a previous transaction. Therefore each transaction input consumes (or “spends”) a previous transaction’s output. The up-to-date content of the database is defined by the set of “unspent transaction outputs” or “UTXOs”.
In a blockchain, a transaction is valid if it fulfills the following three conditions:
Correctness. The transaction must represent a legitimate transformation from inputs to outputs. For example, in a financial ledger, the total quantity of funds in the inputs must match the total in the outputs, to prevent money from magically appearing or disappearing. The only exceptions are special “issuance” or “retirement” transactions, in which funds are explicitly added or removed.
Authorization. The transaction must be authorized by the owner of every output consumed by its inputs. In a financial ledger, this prevents participants from spending each other’s money without permission. Transaction authorization is managed using asymmetric (or public–private key) cryptography. Every row has an owner, identified by a public key, whose corresponding private key is kept secret. In order to be authorized, a transaction must be digitally signed by the owner of each of its inputs. (Note that rows can also have more complex “multisignature” owners, for example where any two out of three parties can authorize their use.)
Uniqueness. If a transaction consumes a particular output, then no other transaction can consume that output again. This is how we prevent Alice from making conflicting payments to both Bob and Charlie. While the transactions for both of these payments could be correct and authorized, the uniqueness rule ensures that only one will be processed by the database.
In a conventional blockchain, every node checks every transaction in terms of these three rules. Later on, we’ll see how Corda divides up this responsibility differently.
A blockchain is literally a chain of blocks, in which every block links to the previous one via a “hash” that uniquely identifies its contents. Each block contains an ordered set of transactions which must not conflict with each other or with those in previous blocks, as well as a timestamp and some other information. Just like transactions, blocks propagate rapidly across the network and are independently verified by every node. Once a transaction appears in a block, it is “confirmed”, leading nodes to reject any conflicting transaction.
Who is responsible for creating these blocks, and how can we be sure that all nodes will agree on the authoritative chain? This question of “consensus algorithms” is a huge subject in itself, filled with wondrous acronyms such as PoW (Proof of Work), PBFT (Practical Byzantine Fault Tolerance) and DPoS (Delegated Proof of Stake). We won’t be getting into all that here. Suffice to say that permissioned blockchains for enterprises use some kind of voting scheme, where votes are granted to “validator nodes” who are collectively responsible. The scheme ensures that, so long as a good majority of validator nodes are functioning correctly and honestly, transactions will enter the chain in a (close to) fair order, timestamps will be (approximately) correct, and confirmed transactions cannot be subsequently reversed.
Before discussing some of the challenges of blockchains, I’d like to clarify three additional points. First, while I am using a financial ledger by example throughout this piece, the input–output model of transactions supports a much broader variety of use cases. Each row can contain a rich data object (think JSON) containing many different types of information – indeed, Corda uses the word “state” rather than “row” for this reason. Richer states change nothing fundamental about transaction rules: correctness is still defined in terms of inputs and outputs, authorization is still required for every input, and uniqueness ensures that each output can only be spent once.
Second, there are many blockchain use cases in which rows are only created in the database, and never deleted. These applications relate to general data storage, timestamping and notarization, rather than maintaining some kind of ledger which is in flux. In these data-only applications, transactions add data in their outputs but consume none in their inputs, allowing the rules for correctness, authorization and uniqueness to be simplified. Although data-only use cases are an increasing focus of our own development at MultiChain, I only mention them in passing here, since Corda was clearly not designed with them in mind.
Finally, it’s worth noting that some blockchain platforms do not use an input–output model. Ethereum presents an alternative paradigm, in which the chain controls a virtual computer with a global state that is managed by “contracts”, and transactions do not connect to each other explicitly. A discussion of Ethereum’s model in permissioned blockchains is beyond our scope here, but see this article for a detailed explanation and critique. One key advantage of the input–output paradigm is that most transactions can be processed in parallel and independently of each other. This property is crucial for Corda, as we’ll see later on.
Let’s imagine that the world’s banks created a shared ledger to represent the ownership, transfer and exchange of a variety of financial assets. In theory, this could be implemented on a regular blockchain, as described above. Each row would contain three columns – an asset identifier such as GOOG or USD, the quantity owned, and the owner’s public key. Each transaction would transfer one or more assets from its inputs to its outputs, with special cases for issuance and retirement.
Every bank in the network would run one or more nodes which connect to the others, propagating and verifying transactions. Senior members would act as validators, with the collective responsibility of confirming, ordering and timestamping transactions. Any validator’s misbehavior would be visible to all the nodes in the network, leading to censure, banishment and/or legal proceedings. With all this in place, any financial asset could be moved across the world in seconds, with the rules of correctness, authorization and uniqueness guaranteeing the ledger’s integrity.
What’s wrong with this picture? Actually, there are three problems: scalability, confidentiality and interoperability. The issue of scalability is simple enough. Our proposed interbank blockchain would require every member to verify, process and store every transaction performed by every bank in the world. Even if this would be technically feasible for the largest financial institutions, the cost of computation and storage would create a significant barrier for many. Surely we’d prefer a system in which participants only see those transactions in which they are immediately involved.
But let’s put scalability aside, since it can ultimately be solved using expensive computers and clever engineering. A more fundamental issue is confidentiality. While it might sound utopian for every transaction to be visible everywhere, in the real world such radical transparency is a non-starter in terms of competition and regulation. If J.P. Morgan and HSBC exchange a pair of assets, they’re unlikely to want Citi and the Bank of China to see what they did. If the transaction was conducted on behalf of these banks’ customers, it could be illegal for them to expose it in this way.
One proposed solution to the problem of confidentiality is “channels”, as implemented in Hyperledger Fabric. Each channel has certain members, who are a subset of the nodes in the network as a whole. A channel’s transactions are visible only to its members, so that each channel effectively acts as a separate blockchain. While this does help with confidentiality, it also undermines the entire point of the exercise. Assets cannot be moved from one channel to another without the help of a trusted intermediary which is active on both. The difficulty of this approach was recently highlighted by SWIFT’s reconciliation proof-of-concept, which estimated that over 100,000 channels would be needed in production. That’s 100,000 islands between which assets cannot be directly moved.
In data-only use cases, where transactions do not consume data in inputs, the confidentiality problem can be sidestepped by encrypting or hashing the data in outputs, and delivering the decryption key or unhashed data outside of the chain. But for a transaction whose inputs consume other transactions’ outputs, every node has to see those inputs and outputs in order to validate the transaction. While advanced cryptographic techniques such as confidential assets and zero knowledge proofs have been developed to partially or completely solve this problem for financial ledgers, these impose a significant performance burden and/or cannot be generalized to any correctness rule.
Finally, let’s talk about interoperability. In an ideal world, every bank would immediately join our global blockchain on the day it was launched. In reality however, multiple blockchains would be adopted by different groups of banks, based on geography or pre-existing relationships. Over time, a member of one group might wish to start transacting with a member of another, by transferring an asset between chains. Just as with channels, this can only be achieved with the help of a trusted intermediary, defeating the blockchain’s purpose.
Corda aims to solve these interrelated problems of scalability, confidentiality and interoperability via a radical rethink of how distributed ledgers work.
Corda’s partial view
The fundamental difference in Corda is easy to explain: Each node only sees some, rather than all, of the transactions processed on the network. While a single logical and conceptual ledger is defined by all these transactions, no individual node sees that ledger in its entirety. To draw a comparison, at any point in time, every dollar bill in the world is in a particular place, but nobody knows where they all are.
So which transactions does a Corda node see? First of all, those in which it is directly involved, because it owns one of that transaction’s inputs or outputs. In a financial ledger, this includes every transaction in which a node is sending or receiving funds. Let’s say Alice creates a transaction which consumes her $15 in an input and has two outputs – one with $10 for me, and the other with $5 in “change” for her. After Alice sends me this transaction, I can check it for correctness and authorization, verifying that the inputs and outputs balance and that Alice has signed.
However, this transaction on its own is not enough. I also need to verify that Alice’s $15 input state really exists, and she didn’t just make it up. That means I need to see the transaction which created this state, and check it for correctness and authorization as well. If this previous transaction, which sent Alice $15, has a $10 input belonging to Denzel and another $5 input from Eric, then I must also verify the transactions which created those. And so on it goes, all the way back to the original “issuance” transaction in which the asset was created. The number of transactions I need to verify will depend on how many times the assets have changed hands and the extent of backwards branching.
Since Corda nodes don’t automatically see every transaction, how do they obtain the ones they need? The answer is from the sender of each new transaction. Before Alice creates a transaction consuming her $15, she must already have verified the transaction in which she received it. And since Alice must have applied the recursive technique above, she will have a copy of every transaction needed for this verification. Bob simply requests these transactions from Alice as part of their interaction. If Alice doesn’t respond appropriately, Bob concludes that Alice is trying to trick him, and rejects the incoming payment. In the case where Bob is sent a new transaction whose inputs have multiple owners, he can obtain the necessary proofs from each.
So far we’ve explained how Bob can verify the correctness and authorization of an incoming transaction, including recursively retracing its inputs’ origins. But there is one more rule we need to think about: uniqueness. Let’s say Alice is malicious. She can generate one transaction in which she pays $10 to Bob, and another in which she pays the same $10 to Charlie. She can send these transactions to Bob and Charlie respectively, along with a full proof of correctness and authorization of each. While both transactions conflict with each other by consuming the same state, there is no way for Bob and Charlie to know this.
Conventional blockchains solve this problem by every node seeing every transaction, making conflicts easy to detect and reject. So how does Corda, with its partial transaction visibility, address the same problem? The answer is with the help of a “notary”. A notary is a trusted party (or parties working together) which guarantees that a particular state is only consumed once. Each state has a specific notary, which must sign any transaction in which that state is consumed. Once a notary has done this, it must not sign another transaction for the same state. Notaries are the network’s guardians of transaction uniqueness.
While every state can have a different notary, all of the states consumed by a particular transaction must be assigned to the same one. This avoids issues relating to deadlocks and synchronization, which should be familiar for those with distributed database experience. Let’s say Alice and Bob agree to exchange Alice’s $10 for Bob’s £7. The transaction for this exchange must be signed by the notaries of both states, but which one goes first? If Alice’s notary signs but Bob’s fails for some reason, then Alice will be left with an incomplete transaction and can never use her $10 again. If Bob’s signs first then he is similarly exposed. While we might like notaries to simply work together, in practice this requires mutual trust and the use of a consensus protocol, complications which Corda’s designers chose to avoid.
If states with different notaries are required as inputs to a single transaction, their owners first execute special “notary change” transactions, which move a state from one notary to another, changing nothing else. So when parties are building a transaction with multiple inputs, they must first agree on the notary to be used, and then perform the notary changes necessary. While the developer in me felt a small twinge of pain when reading about this workaround, there’s no reason why it won’t work so long as notaries play along.
It should also be clarified that, while each notary is a single logical actor in terms of signing transactions, it need not be under the control of a single party. A group of organizations could run a notary collectively, using an appropriate consensus protocol in which a majority of the participants are needed to generate a valid signature. This would prevent any single malicious party from undermining uniqueness by signing transactions that conflict. In theory, we could even allow every node in the network to participant in this kind of shared notarization, although in that case we’d be more-or-less back to a conventional blockchain.
Let’s recap the key differences between Corda and conventional blockchains. In Corda, there is no unified blockchain which contains all of the transactions confirmed. Nodes only see those transactions in which they are directly involved, or upon which they depend historically. Nodes are responsible for checking transaction correctness and authorization but rely on trusted notaries to verify uniqueness.
Of course, there is a lot more to Corda than this: the use of digital certificates to authenticate identity, “network maps” to help nodes find and trust each other, per-state “contracts” which define correctness from each state’s perspective, a deterministic version of the Java Virtual Machine which executes these contracts, “flows” which automate transaction negotiations, “time windows” which restrict transactions by time, “oracles” that attest to external facts and “CorDapps” which bundle many things together for easy distribution. While each of these features is interesting, equivalents for all can be found in other blockchain platforms. My goal in this article is to focus on that which makes Corda unique.
So does Corda live up to its promise? Does it solve the scalability, confidentiality and interoperability problems of blockchains? And in making its particular choices, how much of a price does Corda pay?
More scalable, sometimes
Let’s start with scalability. Here, Corda’s advantage appears clear, since nodes only see some of the transactions in a network. In a regular blockchain, the maximum throughput is constrained by the speed of the slowest node in processing transactions. By contrast, a Corda network could process a million transactions per second, while each node sees just a tiny fraction of that. Scalability extends to notaries as well, since the task of signing transactions for uniqueness can be spread between many different notaries, each..
Per-asset permissions, capacity upgrading and inline metadata
Today we’re pleased to unveil the second preview release of MultiChain 2.0. This makes substantial progress on the MultiChain 2.0 roadmap, and includes an important extra feature relating to asset permissions.
Let’s start with the surprise. This release adds the ability to separately control the send and receive permissions for each asset issued on the blockchain. This control is important in environments where each asset has different characteristics in terms of regulation, user identification requirements and so on.
At the time a new asset is issued, it can optionally be specified as receive- and/or send-restricted. Receive-restricted assets can only appear in transaction outputs whose address has receive permissions for that asset. Similarly, send-restricted assets can only be spent in transaction inputs by addresses which have per-asset send permissions. (Note that in all cases, addresses need global send and receive permissions to appear in inputs and outputs respectively.)
The send and receive permissions for an asset can be granted or revoked by any address which has admin or activate permissions for that asset. By default, these permissions are only assigned to the asset issuer, but the issuer (or any subsequently added asset administrator) can extend them to other addresses as well.
Blockchain parameter upgrades
One of the major features in development for MultiChain 2.0 is blockchain upgrading, to allow many of a chain’s parameters to be changed over time. This is vital because blockchains are designed to run for the long term, and it’s hard to predict how computer systems will be used many years after their creation.
MultiChain 1.0.x already provides a facility for upgrading a single parameter – the chain’s protocol version. This release of MultiChain 2.0 takes a significant step forwards, allowing changes to seven additional parameters related to blockchain performance and scaling. These include the target block time, maximum block size, maximum transaction size and maximum size of metadata.
As with other crucial operations relating to governance, upgrading a chain’s parameters can only be performed by the chain’s administrator(s), subject to a customizable level of consensus. We’re continuing to work on this feature, so look out for more upgradable parameters in future releases of MultiChain 2.0.
MultiChain 1.0.x already supports unformatted (binary) transaction metadata, which can be embedded raw or wrapped in a stream item. The first preview release of MultiChain 2.0 extended this to allow metadata to be optionally represented in text or JSON format. In all of these cases the metadata appears in a separate transaction output containing an OP_RETURN, which makes the output unspendable by subsequent transactions.
This release of MultiChain 2.0 introduces a new type of metadata which we call “inline”. Inline metadata is stored within a regular spendable transaction output, and so is associated directly with that output’s address and/or assets. As with other forms of metadata, inline metadata can be in binary, text or JSON formats, and is easily writable and readable via a number of different APIs.
The road ahead
With this second preview/alpha release, we’ve completed about half of work scheduled for the open source Community edition of MultiChain 2.0. You can download and try out alpha 2 by visiting the MultiChain 2.0 preview releases page. On this page you’ll also find documentation for the new and enhanced APIs.
We’ve already started working on the next major feature for MultiChain 2.0, which we’re calling off-chain stream items. In an off-chain item, only a hash of the item’s payload is embedded inside the chain, alongside the item’s keys and some other metadata. The payload itself is stored locally by the publisher and propagated to the stream’s subscribers using peer-to-peer file sharing techniques, with the on-chain hash providing verification. The result is a huge improvement in the scalability and performance of blockchains used to record large amounts of information, where some of this information is only of interest to certain participants. While not originally planned for MultiChain 2.0, this feature rose up our list of priorities in response to user demand.
As always, we welcome your feedback on the progress of MultiChain 2.0, and look forward to delivering the next preview release in due course.
Solving real problems in infrastructure, finance and e-commerce
It’s exactly two years since we published “Avoiding the pointless blockchain project“, a checklist of questions to ask when assessing permissioned blockchain use cases. The post obviously struck a nerve and continues to attract thousands of monthly readers on our site andothers. People are still hungry for content that goes beyond the blockchain hype to assess this technology objectively.
The good news is that, judging by our incoming inquiries, the market’s understanding of blockchains has greatly improved over the last two years. I would estimate that 60% of the blockchain use cases we now hear are commercially and technically sound. Nonetheless there is still plenty of confusion – companies determined to use a blockchain when a regular database would fit better, startups using “blockchains” in their branding but nowhere else, and widely reported but pointless blockchain projects which use a single node or a group of nodes under a single party’s control.
To recap what I’vewrittenbefore, the core value of a blockchain is to enable a database or ledger to be directly shared across boundaries of trust, without putting any single party in charge. A blockchain lets a group of actors achieve real-time reconciliation of validated, authenticated and timestamped transactions, without the cost, hassle and risk of relying on a trusted intermediary. The chain provides meaningful value when it’s maintained by consensus between multiple nodes, each of which is controlled by a party with different interests. This protects against individual participants (or small groups thereof) from corrupting or deleting past transactions.
MultiChain 1.0 was released a few months ago, and we’re delighted to now share the details of some of the early MultiChain-powered blockchains in production. Each application described below was independently built by a third party using the regular MultiChain software and APIs. All are running in a network of four nodes or more, with multiple active validators. Most importantly, in each case the blockchain is addressing a real business problem that could not be solved by a regular database.
Workflow management for infrastructure projects
Construtivo is a Brazilian software company which builds solutions for the design and construction phase of large infrastructure projects. For the past 15 years, Construtivo’s general approach has been to deliver software-as-a-service (SaaS), in which the company acts as the central trusted intermediary for managing project data. This is the traditional approach to ensuring that all stakeholders maintain a consistent view of a project’s status and progress.
To satisfy their customers’ desire for greater transparency and auditability, Construtivo have now integrated MultiChain into their solution, providing the option of storing crucial project data on a blockchain alongside Construtivo’s database. Several infrastructure projects in South America are already making use of this option. Each project has its own chain, with nodes run by both Construtivo and stakeholders such as contractors and engineering companies. Depending on the project’s requirements, the chain can record plans, contracts, and other workflow-related information, and can be browsed through a web-based interface.
The typical MultiChain network for an infrastructure project has 4 nodes, with an average transaction size of 15K. All nodes in each chain participate in the validation process, with control over user permissions remaining in Construtivo’s hands. As with most of our users, Construtivo researched a number of blockchain platforms to find a suitable fit. When asked why they settled on MultiChain, Rodrigo Trindade, systems analyst at Construtivo, cited its speed, simplicity and ease of integration with their application.
Shared ledger for a catastrophe bond
Solidum Partners is an investment advisory company which specializes in creating catastrophe bonds. These are financial instruments which pay investors a high rate of yield compared to regular commercial bonds, but have a risk of partial or no repayment if a particular event occurs. In essence, purchasers of catastrophe bonds are acting like insurance companies, providing the capital to cover unlikely losses and making a tidy profit so long as those losses don’t materialize.
In order to be easy to trade, non-physical securities like catastrophe bonds are traditionally held by a trusted intermediary on their owners’ behalf. Trades in the security are “settled” virtually via an update of the intermediary’s records. For Solidum, the intermediary of choice had been Euroclear, which holds over $30 trillion in financial assets on behalf of investors, or more than 10% of the world’s total. Naturally, with around 4,000 employees at 15 offices around the world, Euroclear doesn’t provide this service for free.
Due to recent changes at a banking partner, Solidum lost access to Euroclear and had to seek another way. So they issued a new $15 million catastrophe bond directly onto a MultiChain blockchain, along with dollar denominated tokens that could be used for transacting. If you like, they performed two private placement Initial Coin Offerings (ICOs), but with real underlying assets instead of a white paper and the hope of future value.
The blockchain enables safe “delivery-vs-payment” transactions, in which two users exchange dollars and bond units in a single step – a feat which traditionally requires help from a trusted intermediary. Aside from avoiding this middleman’s fees, using a permissioned blockchain gave Solidum easy and direct control over who can participate in the system, without triggering the same heavy regulation as Euroclear and its peers.
Each participant in the network has their own MultiChain node, giving them direct control over their on-chain assets. While a trustee knows the real-world identity behind each address on the blockchain, participants do not know each other’s. (Unlike many financial use cases, the level of activity is not high enough for this veil of confidentiality to be broken.) After completing AML and KYC checks, users are given access to the chain by Solidum and can then transact with each other directly. The network currently has around 10 nodes, 4 of which are permanently online and participate in the consensus process.
When asked why they chose MultiChain, Cedric Edmonds, partner at Solidum, cited its simple built-in support for delivery-vs-payment exchange transactions, as well as its general stability and ease of use.
Transaction notarization for e-commerce
Cryptologic, a blockchain consultancy based in Rosario, Argentina, have built and deployed a system for notarizing e-commerce transactions, in order to help resolve disputes between buyers and sellers. Their first customer is MercadoLibre, Latin America’s most popular e-commerce site, which has almost $1 billion in annual revenues.
Under usual circumstances, when a customer makes a purchase from an online merchant, they have to trust that merchant to record the transaction securely and permanently. But in practice, nothing stops employees of the merchant from deleting or modifying transaction records, and this can serve as a back door for delayed delivery or goods to end up in the wrong hands. By contrast, if each transaction is recorded on a blockchain whose contents are publicly visible, and whose control is spread among a number of different parties, then this record becomes far more difficult to retroactively change.
To preserve confidentiality, transaction data is hashed before being embedded in the chain. The hashes provide a mechanism for timestamping and notarization, and are sufficient to settle later disputes if either party reveals the unhashed transaction. The network currently contains 7 permanent nodes, spread between Cryptologic, various government offices, and a partner abroad. Since transactions contain hashes only, they are fairly small, and the network has seen a peak rate of 50 transactions per second (still well below MultiChain’s maximum throughput).
When asked why they chose MultiChain, Maximiliano Cañellas, CTO at Cryptologic, said they found it really easy to use, with great features like streams, and that the product is very stable, having run for 10 months without interruption.
General lessons learned
These are some early examples of permissioned blockchains in production. The networks are still small, with modest transaction volumes that are far from the limits of products like MultiChain. So it’s important not to extrapolate too much.
Nonetheless, it’s interesting to note what these applications have in common. First and most importantly, they all derive from a genuine desire for decentralization, rather than using a blockchain for a blockchain’s sake. In all three cases, there were clear reasons to choose a blockchain architecture over messaging or a centralized database.
Second, none of the chains have yet transitioned to a decentralized model for governance. All still rely on a single administrator, who onboards new users and grants them permission to transact. It remains to be seen how often decentralized governance (as supported by MultiChain’s admin consensus model) is viable or necessary in practice. Perhaps it is sufficient for the blockchain to provide a transparent view of all administrator activity, while leaving control of this activity with a single party.
Finally, the nature of these applications confirms our view that blockchains are a general purpose technology for shared databases, and not restricted to particular industries or verticals. The lion’s share of media coverage might be received by specific use cases, such as interbank settlement, supply chain finance and shared identity. But in reality, blockchains can be applied whenever we seek to avoid centralized control over a digital system of record. It’s time to think more broadly about the types of problems that this technology can solve.
Today we’re delighted to share the first preview release of MultiChain 2.0, which implements one major part of the MultiChain 2.0 roadmap published earlier this year – a richer data model for streams.
Streams have proven to be a popular feature in MultiChain, providing a natural abstraction for general purpose data storage and retrieval on a blockchain. A MultiChain chain can contain any number of named streams, each of which can have individual write permissions or be open for writing by all. In MultiChain 1.0, each stream item has one or more publishers (who sign it), an optional key for efficient retrieval, a binary data payload up to 64 MB in size, and a timestamp derived from the block in which it’s embedded.
This preview release of MultiChain 2.0, numbered alpha 1, takes streams functionality to a whole new level:
JSON items. As an optional alternative to raw binary data, stream items can now contain any JSON structure, which is stored on the blockchain in the efficient UBJSON serialization format. Since the MultiChain API already uses JSON throughout, these JSON structures can be read and written in a natural and obvious way.
Text items. Stream items may also contain Unicode text, stored efficiently on the blockchain in UTF-8 encoding. Text items can also be read and written directly via the MultiChain API.
Multiple keys. Each stream item can now have multiple keys instead of only one. This enables much more flexible schemes for tagging, indexing and retrieval.
Multiple items per transaction. Multiple items can now be written to the same stream in a single atomic transaction. This allows multiple stream items to: (a) be naturally grouped together under a single transaction ID, (b) take up less space on the blockchain and (c) require fewer signature verifications.
JSON merging. There are new APIs to summarize the items in a stream with a particular key or publisher. The first type of summary offered is a merge of all of the JSON objects in those items. The outcome of the merge is a new object containing all the JSON keys from the individual objects, where the value corresponding to each JSON key is taken from the last item in which that key appears. The merge can be customized in various ways, e.g. to control whether sub-objects are merged recursively and if null values should be included.
The purpose of JSON merging is to enable a stream to serve as a flexible database for applications built on MultiChain, with the stream key or publisher (as appropriate) acting as a “primary key” for each database entry. The advantage over a regular database is that the stream contains a fully signed and timestamped history of how each entry was changed over time, with the blockchain securing this history immutably through multiparty consensus.
As in previous versions, each node can freely decide which streams to subscribe to, or can subscribe to all streams automatically. If a node is subscribed to a stream, it indexes that stream’s content in real time, allowing efficient retrieval by publisher, key, block, timestamp or position – and now summarization by key or publisher.
Aside from stream items, MultiChain 2.0 alpha 1 also supports JSON and text in raw transaction metadata, as alternatives to the raw binary data supported in MultiChain 1.0.
Finally, this release allows the custom fields of issued assets and created streams to contain any JSON object, instead of the text-only key/value pairs offered in MultiChain 1.0. For forwards compatibility, MultiChain 1.0.2 includes the ability to read (but not write) these richer asset and stream custom fields.
To try out these new features, visit the MultiChain 2.0 preview releases page and download alpha 1. The page also provides detailed documentation on the new APIs and parameters available.
We’d love to hear your feedback on this new functionality. And of course we’re already hard at work on the next major set of enhancements for MultiChain 2.0, scheduled for release early next year.
Today we’re delighted to announce the release of MultiChain 1.0 into production and the addition of 14 new members to the MultiChain Partner Program. These include two multinational consulting companies: Cognizant and Indra Sistemas, as well as twelve other companies: Aicumen, Bambusoft, Chainfrog, CrimsonLogic, Encrypgen, Hypatia Technologies, Maroon Studios, Medici Ventures, Project Radium, SolarLab, The Apollo Group and Tilkal.
Apart from that, we’re already hard at work developing MultiChain 2.0 and hope to have a first preview release available (with the richer data model for streams) before the end of the year.
More details can be found in the press release below.
MultiChain Launches Production-Ready Version 1.0 with Fourteen New Partners
August 2, 2017 – Coin Sciences Ltd is delighted to announce the production-ready release of MultiChain 1.0, along with fourteen new members of the MultiChain Partner Program, bringing the total number to 43.
MultiChain 1.0 is available for immediate download for Linux, Windows and Mac, after two and a half years of intensive feedback-driven development. This includes a four-month beta period, during which MultiChain was optimized to support over 1,000 transactions per second on a mid-range server. Since its first alpha release in June 2015, MultiChain has received over 60,000 downloads, more than half of which were during 2017.
The new members of the MultiChain Partner Program include two multinational consulting companies: Cognizant and Indra Sistemas. Twelve more SMBs have also joined: Aicumen, Bambusoft, Chainfrog, CrimsonLogic, Encrypgen, Hypatia Technologies, Maroon Studios, Medici Ventures, Project Radium, SolarLab, The Apollo Group and Tilkal. Members of the program enjoy a close working relationship with the MultiChain engineering team, can use MultiChain branding in their marketing materials, and are promoted on the MultiChain website, which now receives 35,000 visitors monthly.
“We’re delighted to have reached this milestone,” said Dr Gideon Greenspan, CEO and Founder of Coin Sciences Ltd. “Developing the first production release of MultiChain has been an immense challenge, and we’ve learned a great deal about our users and their requirements along the way. Work has already begun on MultiChain 2.0, which will be the first version of MultiChain to come in two editions – Community (open source) and Enterprise (commercial). We look forward to continued growth in usage of the product and cooperating with all our partners to help them leverage it for their needs.”
“We used MultiChain to build a platform for transferring digital assets between different organizations (from commerce to public administration) in a permissioned network where all the participants collaborate,” said Víctor Sánchez Hórreo, Manager of Blockchain and Digital Transformation at Minsait (Indra Sistemas). “The assets act as a key tool to enable social and economic projects, and the features of MultiChain regarding permission management, quick deployment and asset creation fit very well with our needs.”
“Because of its Bitcoin ancestry, MultiChain’s reliability, even during its alpha phase, was great,” said Joel Weight, Chief Technology Officer at Medici Ventures. “The addition of a key-based permission layer and built-in asset support make it the right solution for some of our products.”
“Chainfrog chose to use MultiChain in their music royalties collection pilot because it is based on the mature Bitcoin source base, is incredibly easy to deploy and the APIs are clearly documented,” said Dr Keir Finlow-Bates, CEO and Founder of ChainFrog. “From their blog posts it is obvious that the Coin Sciences team know their blockchains inside out.”
Here at Coin Sciences, we’re best known for MultiChain, a popular platform for creating and deploying permissioned blockchains. But we began life in March 2014 in the cryptocurrency space, with the goal of developing a “bitcoin 2.0″ protocol called CoinSpark. CoinSpark leverages transaction metadata to add external assets (now called tokens) and notarized messaging to bitcoin. Our underlying thinking was this: If a blockchain is a secure decentralized record, surely that record has applications beyond managing its native cryptocurrency.
After less than a year, we stopped developing CoinSpark, due to both a push and a pull. The push was the lack of demand for the protocol – conventional companies were (understandably) reluctant to entrust their core processes to a public blockchain. But there was also a pull, in terms of the developing interest we saw in closed or permissioned distributed ledgers. These can be defined as databases which are safely and directly shared by multiple known but non-trusting parties, and which no single party controls. So in December 2014 we started developing MultiChain to address this interest – a change in direction that Silicon Valley would call a “pivot”.
Two years since its first release, MultiChain has proven an unqualified success, and will remain our focus for the foreseeable future. But we still take an active interest in the cryptocurrency space and its rapid pace of development. We’ve studied Ethereum’s gas-limited virtual machine, confidential CryptoNote-based systems like Monero, Zcash with its (relatively) efficient zero knowledge proofs, and new entrants such as Tezos and Eos. We’ve also closely observed the crypto world’s endless dramas, such as bitcoin’s block size war of attrition, the failures of numerous exchanges, Ethereum’s DAO disaster and Tether’s temporary untethering. Crypto news is the gift that keeps on giving.
Crypto and the enterprise
Aside from sheer curiosity, there’s a good reason for us to watch so closely. We fully expect that many of the technologies developed for cryptocurrencies will eventually find their way into permissioned blockchains. And I should stress here the word eventually, because the crypto community has (to put it a mildly) a far higher risk appetite than enterprises exploring new techniques for integration.
It’s important to be clear about the similarities and differences between cryptocurrencies and enterprise blockchains, because so much anguish is caused by the use of the word “blockchain” to describe both. Despite the noisy objections of some, I believe this usage is reasonable, because both types of chain share the goal of achieving decentralized consensus between non-trusting entities over a record of events. As a result, they share many technical characteristics, such as digitally signed transactions, peer-to-peer networking, transaction constraints and a highly robust consensus algorithm that requires a chain of blocks.
Despite these similarities, the applications of open cryptocurrency blockchains and their permissioned enterprise counterparts appear to be utterly distinct. If you find this surprising or implausible, consider the following parallels: The TCP/IP networking protocol is used to connect my computer to my printer, but also powers the entire Internet. Graphics cards make 3D video games more realistic, but can also simulate neural networks for “deep learning”. Compression based on repeating sequences makes web sites faster, but also helps scientists store genetic data efficiently. In computing, multi-purpose technologies are the norm.
So here at Coin Sciences, we believe that blockchains will be used for both cryptocurrencies and enterprise integration over the long term. We don’t fall on either side of the traditional (almost tribal) divide between advocates of public and private chains. Perhaps this reflects an element of wishful thinking, because a thriving cryptocurrency ecosystem will develop more technologies (under liberal open source licenses) that we can use in MultiChain. But I don’t think that’s the only reason. I believe there is a compelling argument in favor of cryptocurrencies, which can stand on its own.
In favor of crypto
What is the point of cryptocurrencies like bitcoin? What do they bring to the world? I believe the answer is the same now as in 2008, when Satoshi Nakamoto published her famous white paper. They enable direct transfers of economic value over the Internet, without a trusted intermediary, and this is an incredibly valuable thing. But unlike Satoshi’s original vision, I do not see this as a better way to buy coffee in person or kettles online. Rather, cryptocurrencies are a new class of asset for people looking to diversify their financial holdings in terms of risk and control.
Let me explain. In general people can own two types of asset – physical and financial. For most of us physical assets are solid and practical items, like land, houses, cars, furniture, food and clothing, while a lucky few might own a boat or some art. By contrast, financial assets consist of a claim on the physical assets or government-issued money held by others. Unlike physical assets, financial assets are useless on their own, but can easily be exchanged for useful things. This liquidity and exchangeability makes them attractive despite their abstract form.
Depending on who you ask, the total value of the world’s financial assets is between $250 and $300 trillion, or an average of $35-40k per person alive. The majority of this sum is tied up in bonds – that is, money lent to individuals, companies and governments. Most of the rest consists of shares in public companies, spread across the stock exchanges of the world. Investors have plenty of choice.
Nonetheless, all financial assets have something in common – their value depends on the good behavior of specific third parties. Furthermore, with the exception of a few lingering bearer assets, they cannot be transferred or exchanged without a trusted intermediary. These characteristics create considerable unease for these assets’ owners, and that feeling gains credence during periods of financial instability. If a primary purpose of wealth is to make people feel safe in the face of political or personal storms, and the wealth itself is at risk from such a storm, then it’s failing to do its job.
So it’s natural for people to seek money-like assets which don’t depend on the good behavior of any specific third party. This drive underlies the amusingly-named phenomenon of gold bugs – people who hold a considerable portion of their assets in physical gold. Gold has been perceived as valuable by humans for thousands of years, so it’s reasonable to assume this will continue. The value of gold cannot be undermined by governments, who often succumb to the temptation to print too much of their own currency. And just as in medieval times, gold can be immediately used for payment without a third party’s assistance or approval.
Despite these qualities, gold is far from ideal. It’s expensive to store, heavy to transport, and can only be handed over through an in-person interaction. In the information age, surely we’d prefer an asset which is decentralized like gold but is stored digitally rather than physically, and can be sent across the world in seconds. This, in short, is the value proposition of cryptocurrencies – teleportable gold.
On intrinsic value
The most immediate and obvious objection to this thesis is that, well, it’s clearly ridiculous. You can’t just invent a new type of money, represented in bits and bytes, and call it Gold 2.0. Gold is a real thing – look it’s shiny! – and it has “intrinsic value” which is independent of its market price. Gold is a corrosion-resistant conductor of electricity and can be used for dental fillings. Unlike bitcoin, if nobody else in the world wanted my gold, I could still do something with it.
There’s some merit to this argument, but it’s weaker than it initially sounds. Yes, gold has some intrinsic value, but its market price is not derived from that value. In July 2001 an ounce of gold cost $275, ten years later it cost $1840, and today it’s back around the $1200 mark. Did the practical value of dental fillings and electrical wiring rise sevenfold in ten years and then plummet in the subsequent six?
Clearly not. The intrinsic value argument is about something more subtle – it places a lower bound on gold’s market price. If gold ever became cheaper than its functional substitutes, such as copper wiring or dental amalgam, electricians and dentists would snap it up. So if you buy some gold today, you can be confident that it will always be worth something, even if it’s (drastically) less than the price you paid.
Cryptocurrencies lack the same type of lower bound, derived from their practical utility (we’ll discuss a different form of price support later on). If everyone in the world lost interest in bitcoin, or it was permanently shut down by governments, or the bitcoin blockchain ceased to function, then any bitcoins you hold would indeed be worthless. These are certainly risks to be aware of, but their nature also points to the source of a cryptocurrency’s value – the network of people who have an interest in holding and transacting in it. For bitcoin and others, that network is large and continuing to grow.
Indeed, if we look around, we can find many types of asset which are highly valued but have negligible practical use. Examples include jewelry, old paintings, special car license plates, celebrity autographs, rare stamps and branded handbags. We might even say that, in terms of suitability for purpose, property in city centers is drastically overpriced compared to the suburbs. In these cases and more, it’s hard to truly justify why people find something valuable – the reason is buried deep in our individual and collective psyches. The only thing these assets have in common is their relative scarcity.
So I wouldn’t claim that bitcoin’s success was a necessary or predictable consequence of its invention, however brilliant that may have been. What happened was a complete surprise to most people, myself included, like the rise of texting, social media, sudoku and fidget spinners. There’s only one reason to believe that people will find cryptocurrencies valuable, and that is the fact that they appear to be doing so, in greater and greater numbers. Bitcoin and its cousins have struck a psychoeconomic nerve. People like the idea of owning digital money which is under their ultimate control.
Against crypto maximalism
At this point I should clarify that I am not a “cryptocurrency maximalist”. I do not believe that this new form of money will take over the world, replacing the existing financial landscape that we depend on. The reason for my skepticism is simple: Cryptocurrencies are a poor solution for the majority of financial transactions.
I’m not just talking about their sky-high fees and poor scalability, which can be technically resolved with time. The real problem with bitcoin is its core raison d’être – the removal of financial intermediaries. In reality, intermediaries play a crucial role in making our financial activity secure. Do consumers want online payments to be irreversible, if a merchant has ripped them off? Do companies want a data loss or breach to cause immediate bankruptcy? One of my favorite Twitter memes is this from Dave Birch (although note that bitcoin is not truly anonymous or untraceable):
While it’s wonderful to send value directly across the Internet, the price of this wizardry is a lack of recourse when something goes wrong. For the average Joe buying a book or a house, this trade-off is simply a bad deal. And the endless news stories about stolen cryptocurrency and hacked bitcoin exchanges aren’t going to change his mind. As a result, I believe cryptocurrencies will always be a niche asset, and nothing more. They will find their place inside or outside of the existing financial order, alongside small cap stocks and high yield bonds. Not enough people are thinking about the implications of this boring and intermediate outcome, which to me seems most likely of all.
A pointed historical analogy can be drawn with the rise of e-commerce. In the heady days of the dot com boom, pundits were predicting that online stores would supersede their physical predecessors. Others said that nobody would want to buy unseen goods from web-based upstarts. Twenty years later, Amazon, Ebay and Alibaba have indeed built their empires, but physical stores are still with us and attractive to buy. In practice, most of us purchase some things online, and other things offline, depending on the item in question. There are trade-offs between these two forms of commerce, just as there are between cryptocurrencies and other asset classes. He who diversifies wins.
Now about that price
If cryptocurrencies will be around in the long term, but won’t destroy the existing financial order, then the really interesting question is this: Exactly how big are they going to get? Fifty years from now, what will be the total market capitalization of all the cryptocurrency in the world?
In my view, the only honest answer can be: I’ve no idea. I can make a strong case for a long-term (inflation-adjusted) market cap of $15 billion, since that’s exactly where crypto was before this year’s (now deflating) explosion. And I can make an equally strong case for $15 trillion, since the total value of the world’s gold is currently $7 trillion, and cryptocurrencies are better in so many ways. I’d be surprised if the final answer went outside of this range, but a prediction this wide is as good as no prediction at all.
Most financial assets have some kind of metric which acts to anchor their price. Even in turbulent markets, they don’t stray more than 2-3x in either direction before rational investors bring them back into line. For example, the exchange rates between currencies gravitate towards purchasing power parity, defined as the rate at which a basket of common goods costs the same in every country. Bonds gravitate towards their redemption price, adjusted for interest, inflation and risk, which depends on the issuing party. Stocks gravitate towards a price/earnings ratio of 10 to 25, because of the alternatives available to income-seeking investors. (One exception appears to be high-growth technology stocks, but even these eventually come back down to earth. Yes, Amazon, your day will come.)
When it comes to the world of crypto, there is no such grounding. Cryptocurrencies aren’t used for pricing common goods, and they don’t pay dividends or have a deadline for redemption. They also lack the pedigree of gold or artwork, whose price has been discovered over hundreds of years. As a result, crypto prices are entirely at the mercy of Keynesian animal spirits, namely the irrational, impulsive and herd-like decisions that people make in the face of uncertainty. To paraphrase Benjamin Graham, who wrote the book on stock market investing, Mr Crypto Market is madder than a madman. The geeks among us might call it chaos theory in action, with thousands of speculators feeding off each other in an informational vacuum.
Of course, some patterns can be discerned in the noise. I don’t want to write (or be accused of writing) a guide to cryptocurrency investing, so I’ll mention them only in brief: reactions to political uncertainty and blockchain glitches, periods of media-driven speculation, profit-taking by crypto whales, 2 to 4 year cycles, deliberate pump-and-dump schemes, and the relentless downward pressure caused by proof-of-work mining. But if I could give one piece of advice, it would be this: Buy or sell to ensure you’ll be equally happy (and unhappy) whether crypto prices double or halve in the next week. Because either can happen, and you have no way of knowing which.
If the price of a cryptocurrency isn’t tied to anything and moves unpredictably, could it go down to zero? Barring a blockchain’s catastrophic technical failure, I think the answer is no. Consider those speculators who bought bitcoin in 2015 and sold out during the recent peak, making a 10x return. If the price of bitcoin goes back to its 2015 level, it would be a no-brainer for them to buy back in again. In the worst case, they’ll lose a small part of their overall gains. But if history repeats itself, they can double those gains. And maybe next time round, the price will go even higher.
This rational behavior of previous investors translates into a cryptocurrency’s price support, at between 10% and 25% (my estimate) of its historical peak. That’s exactly what happened during 2015 (see chart below) when bitcoin’s price stabilized in the $200-$250 range after dropping dramatically from over $1000 a year earlier. At the time there was no good reason to believe that it would ever rise again, but the cost of taking a punt became too low to resist.
So I believe that cryptocurrencies will be with us for the long term. As long as bitcoin is worth some non-trivial amount, it can be used as a means of directly sending money online. And as long as it serves this purpose, it will be an attractive alternative investment for people seeking to diversify. The same goes for other cryptocurrencies that have reached a sufficient level of interest and support, such as Ethereum and Litecoin. In Ethereum’s case, this logic applies whether or not smart contracts ever find serious applications.
On that subject, I should probably (and reluctantly) mention the recent wave of token Initial Coin Offerings (ICOs) on Ethereum. For the most part, I don’t see these as attractive investments, because their offer price may well be a high point to which they never return. And the sums involved are often ridiculous – if $18 million was enough to fund the initial development of Ethereum, I don’t see why much simpler projects are raising ten times that amount. My best guess is that many ICO investors are looking for something to do with their newly-found Ether riches, which they prefer not to sell to drive down the price. Ironically, after being collected by these ICOs, much is being sold anyway.
Back to reality
There’s a certain symmetry between people’s reactions to cryptocurrencies and enterprise blockchains. In both cases, some shamelessly drive the hype, claiming that bitcoin will destroy the financial system, or that enterprise chains will replace relational databases. Others are utterly dismissive, seeing cryptocurrencies as elaborate Ponzi schemes and permissioned blockchains as a technological farce.
In my view, these extreme positions are all ignoring a simple truth – that there are trade-offs between different ways of doing things, and in the case of both cryptocurrencies and enterprise blockchains, these trade-offs are clear to see. A technology doesn’t need to be good for everything in order to succeed – it just needs to be good for some things. The people who are doing those things have a tendency of finding it.
So when it comes to both public and private blockchains, it’s time to stop thinking in binary terms. Each type of chain will find its place in the world, and provide value when used appropriately. In the case of cryptocurrencies, as an intermediary-free method for digital value transfer and an alternative asset class. And in the case of enterprise blockchains, as a new approach to database sharing without a trusted intermediary.
That, at least, is the bet that we’re making here.
Disclosure: The author has a financial interest in various cryptocurrencies. Coin Sciences Ltd does not.
Today we’re delighted to release the second beta of MultiChain 1.0 for Linux, Windows and Mac (for now the Mac version requires compilation). This concludes the planned development of MultiChain 1.0 – with the exception of any bug fixes, the final release of MultiChain 1.0 over the summer will be unchanged.
This month also marks two years since the first alpha release of MultiChain in June 2015. As with any new product, we weren’t sure how the market would react, and knew there was only one way to find out – release a minimum viable product, meaning an initial version which provides significant value but is preliminary by design. Thankfully, unlike our first product CoinSpark, MultiChain received a strong and immediate positive response. This was accompanied by a tsunami of sensible feature requests, many of which we’ve now implemented. In parallel to the product’s development, usage has also grown remarkably by every measure. For example, the MultiChain website received under 3,000 visitors in July 2015, and now brings in ten times that number monthly.
Over the past two years we’ve invested a lot of effort in optimizing MultiChain, which was forked from Bitcoin Core, the reference implementation for the public bitcoin network. Below is a comparison of transaction throughput for a single-node setup using five versions of the product:
1.0 alpha 3
1.0 alpha 21
1.0 alpha 22
1.0 beta 1
1.0 beta 2
Average transactions per second, including API overhead and building, signing, mining and verifying transactions and blocks. Tests performed using the ab HTTP server benchmarking tool sending two concurrent requests to the sendtoaddress API.Server specifications: Intel Core i7-4770, 4 cores @ 3.4 MHz, 32 GB RAM, Seagate 2 TB 7200 RPM SATA, CentOS 6.4.
Naturally, the biggest jump came in alpha 22 when we transitioned to a database-driven wallet. But since that release, we’ve almost doubled MultiChain’s speed again. We hope we’ve demonstrated that bitcoin’s limit of 4 transactions per second is due to its particular network parameters, and has no relation to blockchains in general.
Of course, performance optimization is a never-ending task, and there’s no reason why MultiChain can’t reach 10,000 tx/sec on a 16-core processor with the appropriate architectural changes. However, based on conversations with our users and partners, it seems that few expect to need more than 1,000 tx/sec for the next few years. So we’re refocusing our development efforts on new features, which brings us nicely onto the subject of MultiChain 2.0.
MultiChain 2.0 overview
Version 2.0 of MultiChain will be the first to come in two editions – Community (open source) and Enterprise (commercial). I’m going to focus here on the free Community edition, since we’re only discussing the details of MultiChain Enterprise with our partners. In any event, the Community and Enterprise editions will be highly compatible, in that: (a) applications built on the Community edition will run without modification on MultiChain Enterprise, and (b) both editions will be able to connect and transact with each other on the same chain.
The three key areas of enhanced functionality in both editions of MultiChain 2.0 will be:
Richer data model for streams, including JSON documents.
Custom programmable transaction filters for on-chain validation.
Seamless updating of a blockchain’s protocol and parameters.
Let’s turn to discuss each of these in detail.
Richer data model for streams
MultiChain streams were introduced in September 2016 and have proven extremely popular. As described in this post, streams provide a simple and natural abstraction for general purpose data storage, indexing and retrieval on a blockchain. A MultiChain blockchain can contain any number of named streams, each of which can either be open to all for writing, or writable only from certain addresses.
In MultiChain 1.0, each stream item has one or more publishers (who sign it), an optional key, a binary data payload up to 64 MB in size, and a timestamp (derived from the block in which it’s embedded). Each node can freely decide which streams to subscribe to, or can subscribe to all streams automatically. If a node is subscribed to a stream, it indexes that stream’s content in real time, allowing efficient retrieval by publisher, key, block, timestamp or position.
MultiChain 2.0 will enrich this streams functionality in a number of ways:
JSON items. As well as binary data, stream items will support structured JSON objects, stored on the blockchain in an efficient serialization format such as UBJSON. Since the MultiChain API already uses JSON throughout, these JSON objects will be writable and readable in a natural and obvious way.
Multiple keys. Stream items will support multiple keys, enabling a single piece of data to be indexed in multiple ways for retrieval using liststreamkeyitems. We’re constantly evaluating how much database functionality to include within MultiChain, and don’t expect to support indexing of the sub-elements within JSON stream items in version 2.0. Allowing multiple keys per stream item provides a reasonable workaround.
Atomic writes of multiple items. MultiChain 1.0 allows a single transaction to write to multiple streams, but not to write multiple items to the same stream. MultiChain 2.0 will remove this restriction.
JSON merging. Any ordered list of JSON objects can be naturally flattened or summarized to create a “merged” object. The merged object contains all the keys which appear in the individual objects, where the value corresponding to each key is taken from the last object in which that key appears. If you like, the merged object is the final state of a database row, whose columns are defined by the first object and extended or updated by later objects. MultiChain 2.0 will add APIs to easily and rapidly retrieve the merged object for the JSON items in a stream with a particular key or publisher.
These features are derived from common ways in which developers are currently using streams. In other words, we’re observing what many people are building on top of MultiChain at the application level, and bringing that functionality into MultiChain itself – a pattern that we intend to continue applying. Now that stream items will include type information, they can easily be extended in future to support other data formats such as XML, HDF5 and MIME-identified content. Not to mention the possibilities of transparent on-chain compression and encryption.
MultiChain 2.0 will also support JSON objects for raw transaction metadata (i.e. not stream items) as well as the metadata for asset issuance and stream creation events, instead of the text-only key/value pairs implemented in MultiChain 1.0. The listassets API will offer JSON merging across all of an asset’s issuance events, so that each issuance’s metadata can effectively update the asset’s final description.
Custom transaction filters
We’ve thought a lot about how to add custom programmable rules to MultiChain. While Ethereum’s “smart contract” paradigm is popular, it has a number of key shortcomings for high-throughput permissioned blockchains. First, smart contracts introduce a global dependency across the blockchain’s entire state, which drastically impairs concurrency and performance. Second, smart contracts cannot stop incorrect transactions from being embedded in a blockchain, but only prevent those transactions from updating the blockchain database’s state. While in the long term we expect an Ethereum-compatible virtual machine to be offered as a high-level abstraction within MultiChain, we don’t think it’s the right solution for low-level validation.
Filters will be passed a JSON object describing an individual transaction, structured like the output of decoderawtransaction but with extra fields. For example, each transaction input in the JSON will include a structure describing the previous transaction output it spends, and each address will be accompanied by a list of permissions currently held by that address. A filter’s job is to return a Boolean value indicating whether the transaction is acceptable and if not, provide a textual error explaining why. MultiChain’s API will include commands for creating filters, testing them on previous or new transactions, and activating them subject to administrator consensus.
Unlike smart contracts, if a bug is discovered in the code for a filter, it can easily be replaced by a new version. Nonetheless, like all Turing-complete code, filters still run the risk of entering an infinite loop. This problem will be mitigated in two ways:
Filters can only be installed and activated by the chain’s administrators, subject to consensus. This gives each administrator the opportunity to examine a filter’s code in depth before voting for it to be activated.
All well-behaved nodes will validate new transactions using the active filters before forwarding them on to their peer nodes. As a result, if a transaction sends a filter into an infinite loop, the transaction should not propagate beyond the node which created it.
We expect one popular application for filters to be validating stream items. For example, a filter could ensure that certain fields in a stream’s JSON items contain numbers in a specific range. In MultiChain 1.0 this type of validation has to be done at the application level, either when writing stream items (if the source is trusted) or when reading them. By contrast, MultiChain 2.0 will enable these rules to be embedded within the blockchain itself, rather like check constraints in a relational database.
MultiChain 2.0 will include two additional features to make filters even more powerful. First, it will introduce user-defined permissions, which exist alongside the eight permissions defined by MultiChain. As with regular permissions, these will be granted to specific addresses by administrators (and in some cases, by users with activate privileges) and included alongside addresses in the JSON object passed to a filter. For example, a filter could ensure that only addresses with a particular user-defined permission can write certain types of data to a stream, or transact in a particular asset above a certain threshold.
Second, MultiChain 2.0 will support custom (binary or JSON) metadata within regular transaction outputs. This will enable any output to act as a general database row, “owned” by the address within. Filters will see any metadata within a transaction’s spent and created outputs as part of its JSON description. As a result, MultiChain will become a universal shared database engine, where a transaction’s validity is determined by a customizable function of the rows it creates and deletes. (If this sounds a little abstract, we’ll be sure to provide some concrete examples.)
Since blockchains are designed to run for many years, their characteristics might need to be changed over time. The current version of MultiChain already provides a fair degree of flexibility, allowing permissions changes (including of administrators and miners by consensus), new assets and streams to be created, and nodes to be seamlessly added or removed from the network. Nonetheless, in MultiChain 1.0 a blockchain’s basic parameters, such as the maximum block size and target confirmation time, are fixed when the chain is created and cannot be subsequently changed.
MultiChain 2.0 will add the ability to update a blockchain, allowing many (but not all) of its parameters to be modified while the chain continues to run. Like other important operations, updating a blockchain will require a customizable level of administrator consensus, where this level itself is a parameter that can be changed. Updates will come into effect from a certain block, and apply thereafter to every subsequent block until the next update.
Blockchain parameters that can be updated will include:
Protocol version. This will enable a blockchain created with one version of MultiChain to be upgraded to support the features in a new version, such as JSON stream items or transaction filters. Indeed, the protocol version 10008 introduced in MultiChain 1.0 alpha 29 (and used in the beta) has already been future-proofed with undocumented support for this type of upgrade. Once a MultiChain 1.0 blockchain is upgraded to the 2.0 protocol, it will also gain access to the other parameter changes described here.
Blockchain scaling. Blockchains that become popular may outgrow the initial values set for their target confirmation time or maximum transaction and block sizes. MultiChain 2.0 will allow these values to be increased or decreased as necessary.
Permissioning model. MultiChain 2.0 will allow the updating of many parameters relating to permissioning and governance, including: (a) anyone-can-* parameters that control the ways in which a blockchain is open or closed, (b) admin-consensus-* parameters that determine the levels of administrator consensus required for certain operations, and (c) the mining-diversity parameter that controls the strictness of the round-robin consensus algorithm.
Once this updating functionality is implemented, there should be no reason why a blockchain created in MultiChain cannot run for many decades or more.
We’ve already started work on MultiChain 2.0, and look forwards to delivering on this roadmap. No doubt other enhancements will be included as well. As with MultiChain 1.0, we’ll have alpha releases along the way, so that developers can use and learn new features as they are implemented (and of course, report any problems or shortcomings). Naturally, we’ll continue to maintain version 1.0 throughout this period, fixing any bugs that appear.
I’d like to finish by thanking our development team, led by Dr Michael Rozantsev, for their continued excellence and hard work. We see MultiChain as a straightforwards software engineering project, in which code quality and testing counts above all. It’s my privilege to work with people who can turn a complex product vision into stable working software with such remarkable efficiency and speed.