{"id":848977,"date":"2022-05-31T18:03:59","date_gmt":"2022-06-01T01:03:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/"},"modified":"2023-03-01T16:01:39","modified_gmt":"2023-03-02T00:01:39","slug":"high-performance-row-pattern-recognition-using-joins-technical-report","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/high-performance-row-pattern-recognition-using-joins-technical-report\/","title":{"rendered":"High-Performance Row Pattern Recognition Using Joins (Technical Report)"},"content":{"rendered":"

The SQL standard introduced MATCH_RECOGNIZE in 2016 for row pattern recognition. Since then, MATCH_RECOGNIZE has been supported by several leading relation systems, they implemented this function using Non-Deterministic Finite Automaton (NFA). While NFA is suitable for pattern recognition in streaming scenarios, the current uses of NFA by the relational systems for historical data analysis scenarios overlook important optimization opportunities. We propose a new approach to use Join to speed up row pattern recognition in historical analysis scenarios for relational systems. Implemented as a logical plan rewrite rule, the new approach first filters the input relation to MATCH_RECOGNIZE using Joins constructed based on a subset of symbols taken from the PATTERN expression, then run the NFA-based MATCH_RECOGNIZE on the filtered rows, reducing the net cost. The rule also includes a specialized cardinality model for the Joins and a cost model for the NFA-based MATCH_RECOGNIZE operator for choosing an appropriate symbol set. The rewrite rule is applicable when the query pattern\u2019s definition is self-contained and either the input table has no duplicates or there is a window condition. Applying the rewrite rule to a query benchmark with 1,800 queries spanning over 6 patterns and 3 pattern definitions, we observed median speedups of 5.4\u00d7 on Trino (v373 with ORC files on Hive), 57.5\u00d7 on SQL Server (2019) using column store and 41.6\u00d7 on row store.<\/p>\nOpens in a new tab<\/span>","protected":false},"excerpt":{"rendered":"

The SQL standard introduced MATCH_RECOGNIZE in 2016 for row pattern recognition. Since then, MATCH_RECOGNIZE has been supported by several leading relation systems, they implemented this function using Non-Deterministic Finite Automaton (NFA). While NFA is suitable for pattern recognition in streaming scenarios, the current uses of NFA by the relational systems for historical data analysis scenarios […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193718],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-field-of-study":[264550],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2022-5-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2022-34","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"Microsoft","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/05\/match_recognize.pdf","id":"918024","title":"match_recognize-2","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":918024,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/02\/match_recognize.pdf"},{"id":911727,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/01\/match_recognize.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"Erkang (Eric) Zhu","user_id":38718,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Erkang (Eric) Zhu"},{"type":"user_nicename","value":"Silu Huang","user_id":38739,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Silu Huang"},{"type":"user_nicename","value":"Surajit Chaudhuri","user_id":33764,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Surajit Chaudhuri"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[957177],"msr_project":[],"publication":[],"video":[],"download":[],"msr_publication_type":"techreport","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/848977"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/848977\/revisions"}],"predecessor-version":[{"id":911730,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/848977\/revisions\/911730"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=848977"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=848977"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=848977"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=848977"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=848977"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=848977"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=848977"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=848977"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=848977"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=848977"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=848977"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=848977"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=848977"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=848977"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}