{"id":351686,"date":"2017-01-12T12:35:07","date_gmt":"2017-01-12T20:35:07","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=351686"},"modified":"2018-10-16T20:15:04","modified_gmt":"2018-10-17T03:15:04","slug":"tackling-poor-assumptions-naive-bayes-text-classifiers","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/tackling-poor-assumptions-naive-bayes-text-classifiers\/","title":{"rendered":"Tackling the Poor Assumptions of Naive Bayes Text Classifiers"},"content":{"rendered":"
Naive Bayes is often used as a baseline in text classi\ffication because it is fast and easyto implement. Its severe assumptions make such efficiency possible but also adversely affect the quality of its results. In this paper we propose simple, heuristic solutions to some of the problems with Naive Bayes classi\ffiers, addressing both systemic issues as well as problems that arise because text is not actually generated according to a multinomial model. We fi\fnd that our simple corrections result in a fast algorithm that is competitive with state-of-the-art text classifi\fcation algorithms such as the Support Vector Machine.<\/p>\n","protected":false},"excerpt":{"rendered":"
Naive Bayes is often used as a baseline in text classi\ffication because it is fast and easyto implement. Its severe assumptions make such efficiency possible but also adversely affect the quality of its results. In this paper we propose simple, heuristic solutions to some of the problems with Naive Bayes classi\ffiers, addressing both systemic issues […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13555],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-351686","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_publishername":"","msr_edition":"Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), Washington, D.C.","msr_affiliation":"","msr_published_date":"2003-09-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"http:\/\/teevan.org\/publications\/papers\/icml03.pdf","msr_doi":"","msr_publication_uploader":[{"type":"url","title":"http:\/\/teevan.org\/publications\/papers\/icml03.pdf","viewUrl":false,"id":false,"label_id":0}],"msr_related_uploader":"","msr_attachments":[{"id":0,"url":"http:\/\/teevan.org\/publications\/papers\/icml03.pdf"}],"msr-author-ordering":[{"type":"text","value":"Jason D. M. Rennie","user_id":0,"rest_url":false},{"type":"text","value":"Lawrence Shih","user_id":0,"rest_url":false},{"type":"user_nicename","value":"teevan","user_id":33975,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=teevan"},{"type":"text","value":"David R. Karger","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/351686"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/351686\/revisions"}],"predecessor-version":[{"id":525168,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/351686\/revisions\/525168"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=351686"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=351686"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=351686"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=351686"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=351686"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=351686"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=351686"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=351686"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=351686"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=351686"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=351686"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=351686"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=351686"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=351686"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=351686"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=351686"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}