Training large language models on my loved ones' data
Today’s “expert” data labelers are tomorrow’s layoffs
Highly-paid “expert” data labelers are behind many of the most recent advances in AI capabilities, even as these workers experience the same indignities as earlier generations of online gig workers. Today’s essay highlights the absurdity of the situation. “What [are] the right demands for an industry that is engineering its own obsolescence?”
—Jessica

Training large language models on my loved ones’ data
By circe
“Till a system was formed, which some took advantage of, & enslav’d the vulgar by attempting to realize or abstract the mental deities from their objects… Thus men forgot that All deities reside in the human breast.” — William Blake
I met one of my best friends in school by sitting next to him in math class. We were both selected to “top set”, the most elite of a dozen or so math classes in our grade — being new to the school and insecure about my mathematical abilities, I was terrified, but R was exceptionally kind to me in his small awkward ways. I’d say the friendship started when I slipped him a drawing of a tiny monkey playing the bongos, which he sincerely praised. Soon after, he graciously offered that I could copy off of him in the Math Olympiad (Reader, I did not). We discussed pretentious absurdist plays, argued about whether it’s acceptable to listen to Kanye and, of course, did a ton of math homework together.
Our paths diverged at university; he studied in the UK, and I in America. While attending college at one of the richest schools in the world, I began training large language models in a research lab in the computer science department. Around the same time, midway through his joint degree in mathematics and philosophy, R stumbled upon Outlier — a subsidiary of Scale AI, a company that sells training data to AI labs including OpenAI and Anthropic, and is now owned by Meta — after Googling “easy ways to make money math tutor” to help make tuition costs.
Following two hours of unpaid onboarding and an hour webinar, R was assigned to Green Eggs and Ham, Cabbage Patch and Green Wizard, codenames for math data labeling jobs of varying difficulty. It paid better than tutoring, and soon R began creating mathematical reasoning data for training large language models between lectures and tutorials.
From what he tells me, the process looks something like this:
You propose a question to a black box reasoning model: any math question that has a singular, complete, final answer. This is prompting.
Given the final answer, the model attempts to reconstruct a correct mathematical reasoning path to arrive at it. This is the reasoning trace.
If the reasoning trace is incorrect, you must label the line of first failure and produce the corrected reasoning step.
The model retries the reasoning process, conditioned on all reasoning steps up to and including the human-corrected one.
Repeat. It typically takes up to 3-4 human corrections for the model to arrive at an end-to-end correct reasoning trace.
Every prompt (and its corresponding answer) must be approved by other mathematicians on the platform. The advertised rate of $50/h is misleading, because half the tasks that are assigned to workers are to review others’ prompts, which is paid at $25/h. Another caveat: if the model gets the question right quickly, the question is deemed “too easy”, and you don’t get paid for your time. So you can waste an entire hour posing questions to a model, hoping that the answer isn’t in its training data (i.e. the entire internet), only to be left empty-handed for your labor.
This data — each transcript of trial and error — is critical to LLM post-training. It’s the kind of extremely expensive training data that enabled OpenAI and DeepMind to get gold medals at the 2025 IMO. Mercor and Surge are able to advertise $50/h rates because the labs are willing to pay more than 100x that to hillclimb reasoning benchmarks by a few percentage points. My job, too, involves training large language models on such data so that they learn to reason through mathematical and logical problems. Nowadays, labs train on a lot of synthetic data generated by other large language models, but at the root of this genealogy is the data created by R and others like him.
After my dad was laid off suddenly at the beginning of this year, he signed up for a data labeling job via Scale after I mentioned the advertised $80 hourly rate for PhD holders. My dad, who spent countless hours at the kitchen table teaching me everything I knew about math, and my best friend, a veritable whiz who underwent the exact same education as me, were now producing data that I am training on.
R once asked me, what’s harder, what you do, or what me and your dad do?
I know that R could easily do the work that I do — and while my work over in Silicon Valley is considered prestigious and intellectually rigorous, his is a gig side hustle that he can’t even list on his resume.
My dad protested, what you do takes years of experience and expertise. But then, what were all of their years of education for, if not building experience and expertise in their field?
After watching a video explaining how chocolate is made from cacao pods with a voiceover, I pitched an analogy to R: data are the raw cacao pods, Scale processes the pods and provides the cacao nibs, and OpenAI makes the chocolate. R replied wryly: that makes me the little monkey climbing up the cacao tree.
The material value of this gig intellectual labor is staggering: Mercor recently raised a $350 million Series C valuing the company at 10 billion USD, purely for sourcing labeled data. Scale AI was purchased by Meta at 14 billion USD. Surge AI made $1.2 billion in revenue in 2024. Salaries for Data Operations Managers — internal employees at AI companies whose job is to source data contracts — exceed $300k/year. Meanwhile, the humans creating this value are completely sidelined, while the names and headshots of AI researchers go viral alongside their reasoning model drops on X, boasting super-human intelligence. One data labeling company which sells labor to OpenAI, Microsoft, Amazon and Cohere is literally named Invisible. All the while, the quality of human-created training data largely underpins the success of their models, which have not undergone any major algorithmic or architectural changes in years.1
One day, out of the blue, R’s final task arrived in his inbox: “Your account has been deactivated. Unfortunately, we are unable to provide specific details regarding the nature of the violation that led to this decision.” He was cut off.
A friend likened the come-and-go nature of data labeling gigs to his cousin working for a wedding catering company which has more business in the summer: it’s not bad or good, it just is. But seasonal work comes back around. Data labeling is work that eats itself: as the model successfully adopts the capabilities that it is shown through labels and environments, the need for that data dissipates. Large language models improved via training on high quality human data become able to generate arbitrary synthetic data much more cheaply, displacing the need for human expertise entirely. It’s more like if your cousin worked and was compensated for a single summer, then was replaced every subsequent summer by a robot caterer that had watched his every move.
Take Uber. Most drivers thought it was great at first. They got a decent cut of the ride price, and it seemed to be an easy way to make some money in your spare time: a convenient side hustle. But the platform soon became so competitive and consuming that drivers quit their other jobs to do it full-time, as they estimated the hourly rate to be higher. And others, having lost their jobs, turned to ride-hailing apps as the stopgap. And when enough drivers did so, saturating supply, Uber/Lyft gained more leverage to push down drivers’ share of the ride fare, while aggressively preventing unionization. Now far more people are stuck in increasingly bad gig conditions with no recourse to organize (except in California, which successfully passed a unionization bill in August 2025). Notably, both Uber and Lyft had huge investments in self-driving teams — demonstrating their long-run goal of disposing with human drivers entirely.
As a data worker, you are removed from the typical career ladder which exists in most actual employee roles, where you may work towards promotion and enjoy basic worker protections (minimum wage, overtime pay, medical leave etc.). There is no such thing as career progression in data labeling; you’re capped at where you start. Working conditions are atomized and alienating: R told me that all of his communications with Outlier were responded to within a minute by AI bots. Data workers are forbidden from talking to each other on the Outlier platform.
The data labeling industry is projected to grow a wild 20-30% annually over the next five years. Simultaneously, many AI industry leaders, including Anthropic CEO Dario Amodei, predict that white collar workers will be the most impacted by AI adoption in 2026. If they are right that we will experience massive disruption in the white collar workforce, what does that near future look like? Working from remote pods, only interacting with AI foremen, constantly at risk of being laid off without due process or explanation?
In many ways, R’s situation mirrors the conditions that labor scholars have documented for decades among content moderators and Mechanical Turk workers: arbitrary termination, unpredictable pay, work whose aim is to replace the need for the worker. The difference is that his area of expertise — advanced mathematics — carries greater cultural prestige; the work can even feel intellectually satisfying. R told me he enjoyed the challenge of finding questions that would “beat” the model. It doesn’t feel like “ghost work” when you’re doing combinatorics at 2AM — that’s just a regular school night. As a result, college-educated data labelers might be loath to group themselves with Mechanical Turkers. But prestige doesn’t translate into protection.
Oskarina Veronica Fuentes Anaya, a data worker in Latin America, described “need[ing] to be available full-time, waiting for tasks that arrive at random intervals and sometimes don’t arrive for weeks.” R said that he got used to staying up until 4AM refreshing the platform for new jobs, checking first thing in the morning to see if he’d received a task. In March 2024, Remotasks suddenly fired and rescinded the earnings of thousands of data workers in Kenya; in September 2025, xAI fired 500 generalist data annotators in one fell swoop on a Friday night. As supply surged, Scale gigs in the Philippines began paying less than local minimum wage; Mercor recently fired thousands of generalist contractors making $21/hour, then offered to rehire them at $16.
The pattern holds across geography and education level. As model capabilities expand, today’s “expert” data labelers are tomorrow’s layoffs. This makes it all the more urgent to establish better worker protections now, before even highly educated workers’ leverage disappears entirely.
I’m not sure what the right demands are for an industry that is engineering its own obsolescence. But some dignities cost platforms almost nothing to provide:
One thing data giants could commit to right now is giving data workers credentials. When their labor is encoded into model weights that continuously generate profit for model companies, it’s only decent to invest in people’s skills and not treat them as disposable. This could look like formalized, portable skill certification which recognizes domain expertise and is accepted across AI platforms. Even better: if data workers were to be given insight into how their data is being used and what model capabilities they contributed to, they could take pride in (and rightfully claim) their part in GPT 5’s improved conversational depth, chemistry equation balancing, and so on.
R couldn’t list his work on a resume, couldn’t appeal his sudden ban, couldn’t even commiserate with other mathematicians on the platform. Data workers should be able to talk to each other. Connecting with your coworkers is crucial for well-being and provides the means for organization and collective action. Workers should know how much they can expect to be paid per week. They should have warning before they’re cut off, and access to a fair appeals and review process afterwards.
As AI researchers and engineers, one of the small contributions we can make is to simply admit how much data matters to our work and learn where that data is actually coming from. Without meticulously curated SFT data and RL environments, there would be no AI boom to boast of at all. Researchers do invaluable work to shape decisions about what data should be fed to their models, and how that data should be manipulated, but the source of that feed is a mountain of human labor.
Google DeepMind’s IMO gold tech report offers one model: they explicitly credit the data and evaluation experts who helped them train their medal-winning model. As the researchers who contribute to model development are credited in Arxiv authorship and engineers are credited in appendices, there should be some record in the dataset card and tech report regarding the data origins — at least geographical or demographic, since the scale of data collection often makes it difficult to track provenance. (Of course, there are incentives against this; data sources are crucial company IP.)
Technologists can directly help too: Dr. Saiph Savage’s lab at Northeastern University has built browser extensions that allow data workers to communicate and share how long a given task takes, enabling workers to assess which tasks are worth it and which ones to avoid. Even small interventions can increase transparency and return some autonomy to workers.
In 2019, Lilly Irani described how engineers saw Mechanical Turk workers as a “stop-gap” until AI could do their work. Seven years on, we are all Mechanical Turk workers of sorts: we train algorithms by scrolling our feeds and completing Captchas. All of us, whether we know it or not, are using and training AI. As more and more industries and skills are implicated, we must not forget that the deity of artificial intelligence resides in our human breast — and we must not abandon the humans who are creating the deity.
circe is a research engineer at an LLM company and writes about AI development and the culture of Silicon Valley. She is based in San Francisco.
Reboot publishes essays by and for technologists. Sign up for more like this!
🌀 microdoses
The latest in crazy things data contractors are being asked to do… upload their prior work products to OpenAI
ICYMI - our very own Jasmine Sun in the New York Times on Chinese peptides
Is there really another tech industry vibe shift? I’d like to hope so, but…
💝 closing note
As always, we are open for pitches!
— Jessica & Reboot team
Maybe reasoning, GRPO, and MoE, but I think that most would agree these aren’t paradigm-shifting overhauls. Other than data, the other major factor driving rapid improvements has been the massive engineering lift to scale models.






really enjoyed this - it felt like the author had exactly the right anecdotal experience, perspective, and ability to tell this story.