{"id":596833,"date":"2019-07-12T08:58:37","date_gmt":"2019-07-12T15:58:37","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=596833"},"modified":"2019-07-19T07:10:24","modified_gmt":"2019-07-19T14:10:24","slug":"leveraging-blockchain-to-make-machine-learning-models-more-accessible","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/leveraging-blockchain-to-make-machine-learning-models-more-accessible\/","title":{"rendered":"Leveraging blockchain to make machine learning models more accessible"},"content":{"rendered":"
We envision a slightly different paradigm, one in which people will be able to easily and cost-effectively run machine learning models with technology they already have, such as browsers and apps on their phones and other devices. In the spirit of democratizing AI, we\u2019re introducing Decentralized & Collaborative AI on Blockchain (opens in new tab)<\/span><\/a>.<\/p>\n Through this new framework, participants can collaboratively and continually train and maintain models, as well as build datasets, on public blockchains, where models are generally free to use for evaluating predictions. The framework is ideal for AI-assisted scenarios people encounter daily, such as interacting with personal assistants, playing games, or using recommender systems. An open-source implementation for the Ethereum blockchain (opens in new tab)<\/span><\/a> is available on GitHub (opens in new tab)<\/span><\/a>, and our paper “Decentralized & Collaborative AI on Blockchain<\/a>“\u2014co-authored by myself and Bo Waggoner (opens in new tab)<\/span><\/a>, a postdoc researcher with Microsoft at the time of the work\u2014will be presented at the second IEEE International Conference on Blockchain (opens in new tab)<\/span><\/a> July 14\u201317.<\/p>\n Leveraging blockchain technology allows us to do two things that are integral to the success of the framework: offer participants a level of trust and security and reliably execute an incentive-based system to encourage participants to contribute data that will help improve a model\u2019s performance.<\/p>\n With current web services, even if code is open source, people can\u2019t be 100 percent sure of what they\u2019re interacting with, and running the models generally requires specialized cloud services. In our solution, we put these public models into smart contracts (opens in new tab)<\/span><\/a>, code on a blockchain that helps ensure the specifications of agreed upon terms are upheld. In our framework, models can be updated on-chain<\/em>, meaning within the blockchain environment, for a small transaction fee or used for inference off-chain<\/em>, locally on the individual\u2019s device, with no transaction costs.<\/p>\n Smart contracts are unmodifiable and evaluated by many machines, helping to ensure the model does what it specifies it will do. The immutable nature and permanent record of smart contracts also allows us to reliably compute and deliver rewards for good data contributions. Trust is important when processing payments, especially in a system like ours that seeks to encourage positive participation via incentives (more to come on that later). Additionally, blockchains such as Ethereum have thousands of decentralized machines all over the world (opens in new tab)<\/span><\/a>, making it less likely a smart contract will become completely unavailable or taken offline.<\/p>\n Ethereum nodes are located around the world. Locations are as of July 4, 2019. Source: https:\/\/www.ethernodes.org (opens in new tab)<\/span><\/a><\/p><\/div>\n Hosting a model on a public blockchain requires an initial one-time fee for deployment, usually a few dollars, based on the computational cost to the blockchain network. From that point, anyone contributing data to train the model, whether that be the individual who deployed it or another participant, will have to pay a small fee, usually a few cents, again proportional to the amount of computation being done.<\/p>\n Using our framework, we set up a Perceptron (opens in new tab)<\/span><\/a> model capable of classifying the sentiment, positive or negative, of a movie review. As of July 2019, it costs about USD0.25 to update the model on Ethereum. We have plans to extend our framework so most data contributors won\u2019t have to pay this fee. For example, contributors could get reimbursed during a reward stage or a third party could submit the data and pay the fee on their behalf when the data comes from usage of the third party\u2019s technology, such as a game.<\/p>\n To reduce computational costs, we use models that are very efficient to train with such as a Perceptron or a Nearest Centroid Classifier (opens in new tab)<\/span><\/a>. We can also use these models along with high-dimensional representations computed off-chain. More complicated models could be integrated using API calls from the smart contract to machine learning services, but ideally, models would be kept completely public in a smart contract.<\/p>\n Adding data to a model in the Decentralized & Collaborative AI on Blockchain framework consists of three steps: (1) The incentive mechanism, designed to encourage the contribution of \u201cgood\u201d data, validates the transaction, for instance, requiring a \u201cstake\u201d or monetary deposit. (2) The data handler stores data and metadata onto the blockchain. (3) The machine learning model is updated.<\/p><\/div>\n Blockchains easily let us share evolving model parameters. Newly created information such as new words, new movie titles, and new pictures can be used to update existing models hosted regardless of a specific person or organization\u2019s ability to update and host the model themselves. To encourage people to contribute new data that will help maintain the model\u2019s performance, we propose several incentive mechanisms: gamified, prediction market\u2013based, and ongoing self-assessment.<\/p>\n Gamified<\/strong>: Like on Stack Exchange (opens in new tab)<\/span><\/a> sites, data contributors can earn points and badges when other contributors validate their contributions. This proposal relies solely on the willingness of contributors to collaborate for a common good\u2014the betterment of the model.<\/p>\n Prediction market\u2013based<\/strong>: Contributors get rewarded if their contribution improves the performance of the model when evaluated using a specific test set. This proposal builds on existing work using prediction market (opens in new tab)<\/span><\/a> frameworks to collaboratively train and evaluate models, including \u201cA Collaborative Mechanism for Crowdsourcing Prediction Problems (opens in new tab)<\/span><\/a>\u201d and \u201cA Market Framework for Eliciting Private Data. (opens in new tab)<\/span><\/a>\u201d<\/p>\n The prediction market\u2013based incentive in our framework has three phases:<\/p>\n Participants are rewarded based on how much their contribution helped the model improve. If the model did worse on the test set, then participants who contributed “bad” data lose their deposit.<\/p>\n Here is how the process looks when running in a simulation:<\/p>\n
<\/a>Significant advances are being made in artificial intelligence, but accessing and taking advantage of the machine learning systems making these developments possible can be challenging, especially for those with limited resources. These systems tend to be highly centralized, their predictions are often sold on a per-query basis, and the datasets required to train them are generally proprietary and expensive to create on their own. Additionally, published models run the risk of becoming outdated if new data isn\u2019t regularly provided to retrain them.<\/p>\nWhy blockchain?<\/h3>\n
<\/a>Deploying and updating models<\/h3>\n
<\/a>Incentive mechanisms<\/h3>\n
\n
<\/a>