To build a machine learning based intelligent system, we often need to collect training labels and feed them into the system. A useful lesson in machine learning is that “more data beats a clever algorithm”. In the current days, through a commercial crowdsourcing platform, we can easily collect a large amount of labels at a cost of pennies per label.
However, the labels obtained from crowdsourcing may be highly noisy. Training a machine learning model with highly noisy labels can be misleading. This is widely known as “garbage in, garbage out”. There are two main reasons on label noise. One is that crowdsourcing workers may not have expertise on a labeling task, and the other is that crowdsourcing workers may have no incentives to produce high quality labels.
Our goal in this project to develop principled inference algorithms and incentive mechanisms to guarantee high quality labels from crowdsourcing in practice.
Contact person: Denny Zhou
People
John Platt
Principal Scientist
Xi Chen
Intern
CMU
Nihar Shah
Intern
UC Berkeley
Qiang Liu
Visiting Scholar
Dartmouth
Chao Gao
Intern
Yale
Tengyu Ma
Visiting Scholar
Princeton