OpenAI Seeks Real-World Work Samples to Train AI Agents

16
OpenAI Seeks Real-World Work Samples to Train AI Agents

OpenAI is actively soliciting real work assignments from contractors to benchmark its next-generation AI models against human performance. The company is requesting contractors to upload past or current job deliverables—documents, presentations, spreadsheets, and even code repositories—as training data. This initiative appears to be a core part of OpenAI’s push towards Artificial General Intelligence (AGI), where AI systems surpass human capabilities in economically valuable tasks.

Human Performance as the Baseline

OpenAI aims to establish a quantifiable human baseline for various tasks. By comparing AI outputs against actual human work samples, the company can assess its models’ progress. Contractors are being asked to provide detailed descriptions of tasks and the corresponding deliverables—the finished work product. This approach prioritizes authenticity, with OpenAI explicitly requesting “real, on-the-job work” rather than simulations.

Confidentiality Concerns

Despite instructions to remove sensitive data, the practice raises significant legal risks. Intellectual property lawyer Evan Brown warns that AI labs could face trade secret misappropriation claims if confidential information leaks. Contractors sharing work samples, even after anonymization, may violate non-disclosure agreements with previous employers. OpenAI itself acknowledges the need to scrub confidential data and even references an internal tool, “Superstar Scrubbing,” for this purpose.

The Expanding AI Training Market

This practice is symptomatic of a broader trend: AI labs are increasingly reliant on high-quality training data. Companies like OpenAI, Anthropic, and Google are hiring armies of contractors through firms like Surge, Mercor, and Handshake AI to generate this data. The demand for skilled contractors has driven up prices, creating a lucrative sub-industry valued in the billions. OpenAI has even explored acquiring data directly from bankrupt companies, though concerns about complete data anonymization halted one such inquiry.

The AI lab is putting a lot of trust in its contractors to decide what is and isn’t confidential… If they do let something slip through, are the AI labs really taking the time to determine what is and isn’t a trade secret? It seems to me that the AI lab is putting itself at great risk.

The reliance on third-party contractors highlights the growing pressure on AI companies to improve their models through real-world data. While OpenAI emphasizes data security, the inherent risks of handling confidential work samples remain a significant concern for both contractors and their former employers.