{"id":864405,"date":"2022-07-25T05:00:01","date_gmt":"2022-07-25T12:00:01","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/"},"modified":"2023-03-29T11:32:11","modified_gmt":"2023-03-29T18:32:11","slug":"spine-a-scalable-log-parser-with-feedback-guidance","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/spine-a-scalable-log-parser-with-feedback-guidance\/","title":{"rendered":"SPINE: A Scalable Log Parser with Feedback Guidance"},"content":{"rendered":"
Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parsing model with initial grouping progressive clustering, we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.9 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of a world-leading cloud system, in which SPINE can parse 30 million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.<\/p>\n","protected":false},"excerpt":{"rendered":"
Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[246574],"research-area":[13561,13556,13563,13560,13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-864405","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-highlight-award","msr-research-area-algorithms","msr-research-area-artificial-intelligence","msr-research-area-data-platform-analytics","msr-research-area-programming-languages-software-engineering","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2022-7-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"SIGSOFT Distinguished Paper Award","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"doi","viewUrl":"false","id":"false","title":"10.1145\/3540250.3549176","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":864408,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/07\/Placeholder_SPINE.docx"}],"msr-author-ordering":[{"type":"text","value":"Xuheng Wang","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Xu Zhang","user_id":41841,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Xu Zhang"},{"type":"user_nicename","value":"Liqun Li","user_id":32698,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Liqun Li"},{"type":"user_nicename","value":"Shilin He","user_id":40258,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shilin He"},{"type":"text","value":"Hongyu Zhang","user_id":0,"rest_url":false},{"type":"text","value":"Yudong Liu","user_id":0,"rest_url":false},{"type":"text","value":"Lingling Zheng","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Yu Kang","user_id":39381,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yu Kang"},{"type":"user_nicename","value":"Qingwei Lin","user_id":33318,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Qingwei Lin"},{"type":"user_nicename","value":"Yingnong Dang","user_id":35001,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yingnong Dang"},{"type":"user_nicename","value":"Saravan Rajmohan","user_id":41039,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Saravan Rajmohan"},{"type":"user_nicename","value":"Dongmei Zhang","user_id":31665,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Dongmei Zhang"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[],"msr_group":[144847,714577],"msr_project":[853323,855579],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":853323,"post_title":"Cloud System and Software Analytics","post_name":"cloud-system-and-software-analytics","post_type":"msr-project","post_date":"2022-06-24 00:55:15","post_modified":"2022-10-24 01:21:01","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/cloud-system-and-software-analytics\/","post_excerpt":"In Microsoft, we build and operate several world leading complex and large-scale productivity clouds (Azure, Microsoft 365). The quality of cloud platforms, including reliability, performance, efficiency, security, sustainability, etc., has become immensely important. The distributed nature, massive scale, and high complexity of cloud platforms present huge challenges to build and operate such systems effectively and efficiently. Each independent service in cloud computing, such as computing virtualization, cloud storage service, distributed database, etc., is a complex…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/853323"}]}},{"ID":855579,"post_title":"AIOps","post_name":"aiops","post_type":"msr-project","post_date":"2022-06-24 04:09:36","post_modified":"2022-10-25 05:28:06","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/aiops\/","post_excerpt":"In the past fifteen years, the most significant paradigm shift in the computing industry is the migration to cloud computing, which brings unprecedented opportunities of digital transformation to business, society, and human life. The implication of this is profound. It means that cloud computing platforms have become part of the basic infrastructure of the world. Therefore, the non-functional properties of cloud computing platforms, including availability, reliability, performance, efficiency, security, sustainability, etc., become immensely important. The…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/855579"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/864405"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/864405\/revisions"}],"predecessor-version":[{"id":864411,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/864405\/revisions\/864411"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=864405"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=864405"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=864405"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=864405"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=864405"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=864405"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=864405"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=864405"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=864405"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=864405"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=864405"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=864405"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=864405"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=864405"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=864405"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=864405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}