{"id":167628,"date":"2015-02-01T00:00:00","date_gmt":"2015-02-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/toward-predicting-the-outcome-of-an-ab-experiment-for-search-relevance\/"},"modified":"2018-10-16T22:05:06","modified_gmt":"2018-10-17T05:05:06","slug":"toward-predicting-the-outcome-of-an-ab-experiment-for-search-relevance","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/toward-predicting-the-outcome-of-an-ab-experiment-for-search-relevance\/","title":{"rendered":"Toward Predicting the Outcome of an A\/B Experiment for Search Relevance"},"content":{"rendered":"

A standard approach to estimating online click-based metrics
\nof a ranking function is to run it in a controlled experiment
\non live users. While reliable and popular in practice,
\nconguring and running an online experiment is cumbersome
\nand time-intensive. In this work, inspired by recent
\nsuccesses of oine evaluation techniques for recommender
\nsystems, we study an alternative that uses historical search
\nlog to reliably predict online click-based metrics of a new
\nranking function, without actually running it on live users.
\nTo tackle novel challenges encountered in Web search,
\nvariations of the basic techniques are proposed. The rst
\nis to take advantage of diversied behavior of a search engine
\nover a long period of time to simulate randomized data
\ncollection, so that our approach can be used at very low cost.
\nThe second is to replace exact matching (of recommended
\nitems in previous work) by fuzzy matching (of search result
\npages) to increase data eciency, via a better trade-o
\nof bias and variance. Extensive experimental results based
\non large-scale real search data from a major commercial
\nsearch engine in the US market demonstrate our approach
\nis promising and has potential for wide use in Web search.<\/p>\n","protected":false},"excerpt":{"rendered":"

A standard approach to estimating online click-based metrics of a ranking function is to run it in a controlled experiment on live users. While reliable and popular in practice, conguring and running an online experiment is cumbersome and time-intensive. In this work, inspired by recent successes of oine evaluation techniques for recommender systems, we study […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13555],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-167628","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_publishername":"ACM - Association for Computing Machinery","msr_edition":"Proceedings of the 8th ACM International Conference on Web Search and Data Mining","msr_affiliation":"","msr_published_date":"2015-02-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"Proceedings of the 8th ACM International Conference on Web Search and Data Mining","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"204491","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"paper.pdf","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/paper-11.pdf","id":204491,"label_id":0}],"msr_related_uploader":"","msr_attachments":[{"id":204491,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/paper-11.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"lihongli","user_id":32676,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=lihongli"},{"type":"user_nicename","value":"izitouni","user_id":32135,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=izitouni"},{"type":"text","value":"Jin Young Kim","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[171233],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/167628"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/167628\/revisions"}],"predecessor-version":[{"id":541934,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/167628\/revisions\/541934"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=167628"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=167628"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=167628"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=167628"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=167628"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=167628"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=167628"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=167628"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=167628"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=167628"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=167628"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=167628"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=167628"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=167628"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=167628"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}