Google contractors lack time to verify AI chatbot answers, forced to guess

Google’s Contract Workers Face Tight Time to Assess AI‑Chatbot Quality

Background: Bard’s Beta Roll‑out and the Role of “Raters”

Google introduced its AI chatbot Bard in a limited beta in March, following OpenAI’s ChatGPT launch. Bard functions by taking a user prompt—such as a question or a task—and generating a human‑like reply. While Bard’s public beta is visible to all, contractors hired through Appen are now tasked with evaluating the chatbot’s responses, though they are not told explicitly that the assignments involve Bard.

Contractor Experience: Short Review Times and Uncertain Grading

Four anonymous raters, who are not authorized to speak to the press, told Insider that their focus has shifted from traditional search algorithm evaluation to AI prompt assessment in January. They report that reviewers are often given limited time, making it difficult to grade responses accurately. In some cases, raters admit to making best‑guess selections to ensure they receive payment.

Raters evaluate search relevance and flag harmful sites.
New assignments involve reviewing chatbot prompts.
Time constraints hinder accurate quality grading.

Bard’s Public Criticism and Google’s Response

Bard received backlash after an event where the bot provided an incorrect answer. Google assures that the chatbot will improve over time and clarifies that it should not replace the search engine.

Employee Involvement Prior to the Launch

In February, Google asked full‑time employees to dedicate two to four hours daily to test Bard. Employees were encouraged to ask questions, flag inaccurate answers, and rewrite responses on any topic. Bard learned from these employee edits, refining its response generation.

Business Insider delivers the cutting‑edge tales you’re curious about

Why you’ll keep coming back for fresh insights

Tech breakthroughs – the next generation of AI, quantum, and 5G.
Corporate revolutions – brands redefining productivity and sustainability.
Data‑driven futures – analytics that turn information into competitive advantage.

Google and Appen said they were unwilling to comment on the recent development.

—

Not enough time

Raters Face Tight Time Constraints While Evaluating AI Answers

The latest instruction guide for content raters revealed that workers receive a user-provided prompt—such as a question, instruction, or statement—along with two machine-generated responses. Their job is to decide which answer is superior and, if desired, explain the reasoning in a text box. This feedback can help the AI model learn the attributes that make a response acceptable, including coherence, accuracy, and up-to-date information.

Variable Task Durations and Domain Knowledge Challenges

Each rating task comes with a rigid time limit that can swing from a bare minimum of 60 seconds to several minutes.
Because the work is billable, some workers admit they will complete a task even if they realize they cannot rate the response accurately.
When a prompt covers a highly technical subject—say, the intricacies of blockchain—raters often feel they lack enough background, making a thorough evaluation difficult.

Worker Perspectives on Time Scarcity

One rater explained, “I’m trying to keep that pay and keep working, so I give my best guess after realizing I don’t know enough.” Another expressed a similar sentiment, noting that they “want to get the facts right and provide the best quality chatbot experience they can but are simply not given enough time to research a topic before they need to provide an assessment.”

“A lot of us are at our breaking point, honestly,” one rater conveyed. “Three hours of research to complete a 60-second task, that’s a great way to frame the problem we’re facing right now.”

Key Takeaway

Raters are pressed to deliver quick assessments under a pay-based time quota, often without the necessary domain knowledge or research window. The result is a tension between financial viability and the quality of the final user experience.

Contractors are demanding better working conditions

b>Google rater union pushes for better wages

Workers contracted through third‑party firms have been lobbying Google for improved pay and working conditions. In February, a group of raters visited the Googleplex to hand a petition to the head of search, Prabhakar Raghavan, urging the company to raise wages.

b>Current pay rates

Appen raters earn between $14 and $14.50 per hour.
These contractors support a business that generates most of its revenue from search and advertising.

b>Union representation

The Alphabet Workers Union (AWU) serves as a “solidarity union,” meaning it supports the raters and aids with activism but does not formally represent them or negotiate a collective‑bargaining agreement.

b>Unions in other segments

In Austin, Texas, YouTube contractors announced last year that they plan to unionize with the AWU. The AWU estimates that Google employs more than 200,000 people as contractors who are not recorded in the company’s official head count.

Got a tip?

Contact the reporter via email at tmaxwell@insider.com, signal at 540.955.7134, or Twitter at @tomaxwell.

About Editor

Hillan Leo

Find Me On

Trending News

Sports

Lifestyle

Lifestyle

Lifestyle

Lifestyle

Google contractors lack time to verify AI chatbot answers, forced to guess

Google’s Contract Workers Face Tight Time to Assess AI‑Chatbot Quality

Background: Bard’s Beta Roll‑out and the Role of “Raters”

Contractor Experience: Short Review Times and Uncertain Grading