Crowdsourcing tips

Based on my own experience with Amazon Mechanical Turk and valuable advice from my lab members.
### Overall turking recipe 1. Create a Turk task 2. Debug it on the MTurk sandbox 3. Figure out pricing to pay people reasonably 4. Lauch the task on "real" MTurk and find good workers #### Sandbox and pricing 1. Create an MTurk sandbox [requester](https://requestersandbox.mturk.com/) and [worker](https://workersandbox.mturk.com/) accounts 2. Upload a small batch of hits (~10) 3. Do like 2 or 3 hits (and share the sandbox link with co-authors), see how long it takes for all of us on average / median time. 4. Using those times, find a price per HIT that amounts to like ~$12-15/h approximately (goal is to be above minimum wage) #### Launching the task and finding good workers 1. Run a pilot task on a small amount of examples (between 100-500), with a slightly higher number of workers/HIT than what your final task will be, to ensure wide participation. - For categorical tasks: consider selecting the examples that you know the answer to, for easier grading of workers. 2. Assess the quality of each worker's responses: - For categorical tasks: you can set up an autograder based on the responses you expect (if you have your own answers). - For free-text tasks: download CSV of results and scan through HITs, sorted by workerID. 3. While scanning, make two lists of workerIDs - List of good workers. - List of bad workers. 4. Create two MTurk qualifications on Mturk: - For good workers: call it "GreatAtMyTask" - For bad workers: call it "PreviouslyDoneMyTask"; the idea is that we can avoid upsetting workers by avoiding saying that they're bad at something. 5. Assign bad workers to bad qual and good workers to good qual. 6. Create a copy of your pilot task - Set the requisites to be GoodAtMyTask (and other quals) - Reduce the number of workers/HIT to what you intended originally 7. If needed, re-run small "qualification" batch and do 1-6 again. Make sure to disallow good and bad workers from doing this qualification task. ### Background I compiled this high-level recipe based on valuable advice I got from my current and former labmates who Turk (including [Emily Allaway](https://www.aclweb.org/anthology/people/e/emily-allaway/), [Hannah Rashkin](https://homes.cs.washington.edu/~hrashkin/)) and my own experience Turking (specifically, on the [ATOMIC](https://mosaickg.apps.allenai.org/kg_atomic), [SocialIQa](https://leaderboard.allenai.org/socialiqa/submissions/get-started), and [Social Bias Frames](https://homes.cs.washington.edu/~msap/social-bias-frames/) projects)