Exact Exponent in Optimal Rates for Crowdsourcing

Chao Gao; Yu Lu; Denny Zhou

Exact Exponent in Optimal Rates for Crowdsourcing

Chao Gao ,
Yu Lu ,
Denny Zhou

Proceedings of the 33rd International Conference on Machine Learning | June 2016

下载 BibTex

Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(π), where m is the number of workers and I(π) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m≥1/I(π) log(1/ϵ) in order to achieve an ϵ misclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters