{"id":16865,"date":"2016-10-11T14:00:10","date_gmt":"2016-10-11T21:00:10","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/dataplatforminsider\/?p=16865"},"modified":"2024-01-22T22:50:35","modified_gmt":"2024-01-23T06:50:35","slug":"1000000-predictions-per-second","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2016\/10\/11\/1000000-predictions-per-second\/","title":{"rendered":"1,000,000 predictions per second"},"content":{"rendered":"
This post is by Joseph Sirosh, Corporate Vice President of the Data Group at Microsoft.<\/em><\/p>\n Online transaction processing (OLTP) database applications have powered many enterprise use-cases in recent decades, with numerous implementations in banking, e-commerce, manufacturing and many other domains. Today, I\u2019d like to highlight a new breed of applications that marry the latest OLTP advancements with advanced insights and machine learning. In particular, I\u2019d like to describe how companies can predict a million events per second<\/strong> with the very latest algorithms<\/strong>, using readily available software. We have shown this demo at the Microsoft Machine Learning and Data Science Summit<\/a> and my General Session at Ignite<\/a> in Atlanta, Georgia. You can watch both online. The predictive model was based on a boosted decision tree algorithm with 50 trees and 33 features.<\/p>\n <\/a><\/p>\n Take credit card transactions, for instance. These can trigger a set of decisions that are best handled with predictive models. Financial services companies need to determine whether a particular transaction is fraudulent or legitimate.<\/p>\n As the number of transactions per second (TPS) increase, so does the number of predictions per second (PPS) that organizations need to make. The Visa network, for instance, was capable of handling 56,000 TPS last year and managed over 100 billion yearly transactions. With each transaction triggering a set of predictions and decisions, modern organizations have a need for a powerful platform that combines OLTP with a high-speed prediction engine. We expect that an increasing number of companies will need to hit 1 million predictions per second (PPS) or more<\/strong> in coming years.<\/p>\n What kind of architecture would enable such use cases? At Microsoft, we believe that computing needs to take place where data lives. This minimizes data movement, eliminates the costs and security risks associated with data movement and the prediction engine sits close to the database (i.e., in-database analytics). Moreover, the predictive models can be shared by multiple applications. That\u2019s precisely how SQL Server 2016 was designed.<\/p>\n Take the credit card fraud detection example I mentioned above \u2013 one can handle it in the following manner:<\/p>\n This is shown visually below:<\/p>\n <\/a><\/p>\n The above architecture is very versatile. In addition to using it in fraud detection, we\u2019ve applied this architecture to perform what-if<\/em> analysis on an auto loan dataset.<\/p>\n Imagine a loan application where a financial services company needs to determine if a loan will be repaid on time. Similarly to predicting fraudulent transactions, you can leverage SQL Server 2016 as a Scoring Engine to predict \u201cbad\u201d loans. Loans that indicate good repayment behavior are considered \u201cgood\u201d and loans that indicate less than perfect repayment behavior are considered \u201cbad\u201d. Imagine scanning through millions of loan applications and being able to predict \u2013 within seconds \u2013 which loans will default<\/strong>. Now imagine a business analyst launching the same exercise while modeling a scenario where the Federal Reserve increases interest rates. Our loan default prediction model was able to reach and exceed a staggering 1,250,000 predictions<\/strong> per second, completing the what-if analysis within 15-18 seconds. This capability now enables our customers to have near real-time predictive analytics. The architecture is shown visually below:<\/p>\n <\/a><\/p>\n One of the common tasks from customers is to provide an intelligent method of predicting how changing factors like interest rates, loan terms or even a member\u2019s credit score would affect the charge-off probability. You can specify a what-if input for an increased interest rate and score the open loans with the new proposed interest rate using parallel threads which call a SQL Server stored procedure to invoke the scoring model on the open loans. You can take these predictions and compare the base predictions with the what-if predictions. Then you can study the probability of HIGH charge-offs increasing with an increase in interest rate and how it may effect various branches of your business. Such near real-time predictive analytics capabilities minimize research bias, dramatically increase business flexibility and focus on attributes that matter which results in higher profitability.<\/p>\n At Ignite, we had Jack Henry & Associates on the stage with me. They provide more than 300 products and services to over 10,000 credit unions and enable them to process financial transactions plus automate their services. Using SQL Server as a Scoring Engine, enabled their vision of building an intelligent enterprise data warehouse<\/strong> which would help their customers increase their productivity. They have been working with Microsoft to leverage SQL Server with built-in R capability to build intelligence into their current data warehousing service portfolio. Such an intelligent data warehouse helps credit unions and financial services become more flexible and react to situations in a data-driven manner. We see opportunities in applying such models within the database to customer churn predictions, predicting loan portfolio changes and a host of other scenarios. Several banking and insurance companies rely on very complex architectures to do predictive analytics and scoring today. Using the architecture outlined in this blog, businesses can do this in a dramatically simpler and faster manner.<\/p>\n The possibilities are endless.<\/p>\n <\/a><\/p>\n We\u2019ve posted several samples on GitHub<\/a>. The available templates are listed below.<\/p>\n This is how companies are predicting at the speed of data, today.<\/p>\nTransactional Workloads + Intelligence<\/h2>\n
\n
Analytical Workloads + Intelligence<\/h2>\n
\n