{"id":980577,"date":"2023-10-30T10:06:51","date_gmt":"2023-10-30T17:06:51","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=980577"},"modified":"2024-01-16T06:40:16","modified_gmt":"2024-01-16T14:40:16","slug":"treebeard-an-optimizing-compiler-for-decision-tree-based-ml-inference","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/treebeard-an-optimizing-compiler-for-decision-tree-based-ml-inference\/","title":{"rendered":"Treebeard: An Optimizing Compiler for Decision Tree Based ML Inference"},"content":{"rendered":"<p>Decision tree ensembles are among the most commonly used machine learning models. These models are used in a wide range of applications and are deployed at scale. Decision tree ensemble inference is usually performed with libraries such as XGBoost, LightGBM, and Sklearn. These libraries incorporate a fixed set of optimizations for the hardware targets they support. However, maintaining these optimizations is prohibitively expensive with the evolution of hardware. Further, they do not specialize the inference code to the model being used, leaving significant performance on the table. This paper presents TREEBEARD, an optimizing compiler that progressively lowers the inference computation to optimized CPU code through multiple intermediate abstractions. By applying model-specific optimizations at the higher levels, tree walk optimizations at the middle level, and machine-specific optimizations lower down, TREEBEARD can specialize inference code for each model on each supported CPU target. TREEBEARD combines several novel optimizations at various abstraction levels to mitigate architectural bottlenecks and enable SIMD vectorization of tree walks. We implement TREEBEARD using the MLIR compiler infrastructure and demonstrate its utility by evaluating it on a diverse set of benchmarks. TREEBEARD is significantly faster than state-of-the-art systems, XGBoost, Treelite and Hummingbird, by 2.6\u00d7, 4.7\u00d7 and 5.4\u00d7 respectively in a single-core execution setting, and by 2.3\u00d7, 2.7\u00d7 and 14\u00d7 respectively in multi-core settings.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Decision tree ensembles are among the most commonly used machine learning models. These models are used in a wide range of applications and are deployed at scale. Decision tree ensemble inference is usually performed with libraries such as XGBoost, LightGBM, and Sklearn. These libraries incorporate a fixed set of optimizations for the hardware targets they [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-980577","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2022-10-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"IEEE","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":0,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/ieeexplore.ieee.org\/document\/9923840","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Ashwin Prasad","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Sampath Rajendra","user_id":43107,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sampath Rajendra"},{"type":"user_nicename","value":"Kaushik Rajan","user_id":32574,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kaushik Rajan"},{"type":"text","value":"R Govindarajan","user_id":0,"rest_url":false},{"type":"text","value":"Uday Bondhugula","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199562,199565],"msr_event":[],"msr_group":[144939],"msr_project":[967329],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":967329,"post_title":"Domain Specialization","post_name":"domain-specialization","post_type":"msr-project","post_date":"2023-10-16 02:14:29","post_modified":"2024-01-12 08:47:20","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/domain-specialization\/","post_excerpt":"Scaling performance beyond Moore's law Domain&nbsp;specialization&nbsp;is&nbsp;expected&nbsp;to&nbsp;play&nbsp;a&nbsp;big&nbsp;role&nbsp;in&nbsp;how&nbsp;computer&nbsp;systems&nbsp;evolve&nbsp;in&nbsp;future. With&nbsp;the&nbsp;end&nbsp;of&nbsp;Moore's&nbsp;law,&nbsp;we&nbsp;are&nbsp;already&nbsp;seeing&nbsp;CPU,&nbsp;GPU&nbsp;and domain specific&nbsp;hardware&nbsp;evolving&nbsp;rapidly. The next decade&nbsp;is&nbsp;therefore&nbsp;expected&nbsp;to&nbsp;see&nbsp;big&nbsp;changes&nbsp;in&nbsp;how&nbsp;we&nbsp;develop,&nbsp;compile&nbsp;and&nbsp;run&nbsp;software. This project focuses on data systems, a class of systems where, as the data sizes grow, performance scaling is going to be of importance.First,&nbsp;we&nbsp;believe&nbsp;that domain-specific&nbsp;compilers&nbsp;will&nbsp;play&nbsp;a&nbsp;crucial&nbsp;strategic&nbsp;role&nbsp;in&nbsp;helping&nbsp;software&nbsp;leverage&nbsp;the&nbsp;changing&nbsp;hardware&nbsp;landscape. Such compilers will be multi-layered and will progressively lower computation through multiple intermediate abstractions, performing domain specific optimizations at the higher layers and specializing code to the hardware in lower layers. We&nbsp;have&nbsp;been&nbsp;working&nbsp;on&nbsp;two&nbsp;such&nbsp;domain&nbsp;specific&nbsp;compilers&nbsp;in&nbsp;the&nbsp;data&nbsp;domain. Second, new hardware specific algorithms need&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967329"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980577","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980577\/revisions"}],"predecessor-version":[{"id":999144,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980577\/revisions\/999144"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=980577"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=980577"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=980577"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=980577"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=980577"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=980577"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=980577"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=980577"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=980577"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=980577"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=980577"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=980577"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=980577"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=980577"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=980577"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=980577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}