Sharing Updatable Models (SUM) on Blockchain

Sharing Updatable Models (SUM) on Blockchain is a framework for sharing and training decentralized machine learning models. Using the model to get predictions for data is free because the model is public. We are facilitating crowdsourcing on the blockchain; allowing people to easily and transparently improve the models they use in everyday products. A blockchain ensures the persistence of models giving customers trust in the services they use.

(opens in new tab)

Our goal is to encourage decentralized hosting and versioning of public machine learning models to democratize AI using blockchain technology. A one-time deployment fee of usually a few dollars is paid to the blockchain network versus the typical ongoing subscription fees that must be paid to a cloud service provider to continuously host a model. Anyone can pay a transaction fee of a few cents if they want to improve the model stored in a smart contract using some training data, in contrast to most services charging dollars per month for access. A deposit of a variable amount is also sent with training data. If the data is determined to be good, they should get a full refund for their deposit and might even earn rewards from either a sponsor or from people that added bad data. In the same spirit as Data Dignity initiatives, this gives customers direct reimbursement for their contributions to models. There are many ways to encourage contributors to submit good quality data. We’ve analyzed several examples including gamification (non-financial, points + badges like Stackoverflow), mechanisms based on established theory in Prediction Markets, and a Self-Assessment mechanism that requires no oversight.

The vision of this project is for companies to one day share underlying models just like how they share open source software and model architectures now. Public models can be used in products to earn customer trust through transparency in the training of the model, the use of the model’s predictions, and the model’s persistence.

Architecture Overview

Contributing data can be broken down into 3 steps:

Picture of a someone sending data to the addData method in CollaborativeTrainer which sends data to the 3 main components as further described next.

The IncentiveMechanism validates the transaction, for instance, in some cases a “stake” or deposit is required.
The DataHandler stores data and meta-data on the blockchain. This ensures that it is accessible for all future uses, not limited to this smart contract.
The machine learning model is updated according to predefined training algorithms. In addition to adding data, anyone can query the model for predictions, and the incentive mechanism may be triggered to provide users with payments or virtual “karma” points.

Learn More

Check out our code with Python simulations and Solidity (Ethereum) demos at github.com/microsoft/0xDeCA10B (opens in new tab).

The basics of the framework can be found in our blog post (opens in new tab). A demo of the self-assessment incentive mechanism can be found here (opens in new tab).

Papers

Even more details can be found in the initial paper (opens in new tab) describing the framework, accepted to Blockchain-2019, The IEEE International Conference on Blockchain.

An analysis of several machine learning models with the self-assessment incentive mechanism can be found in our second paper (opens in new tab) which was accepted to The 2020 International Conference on Blockchain (opens in new tab).