Exact Exponent in Optimal Rates for Crowdsourcing

  • Chao Gao ,
  • Yu Lu ,
  • Denny Zhou

Proceedings of the 33rd International Conference on Machine Learning |

Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(π), where m is the number of workers and I(π) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m≥1/I(π) log(1/ϵ)  in order to achieve an ϵ misclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters